20
Parallelizing the Fast Fourier Transform David Monismith cs599

Parallelizing the Fast Fourier Transform David Monismith cs599

Embed Size (px)

Citation preview

Page 1: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallelizing the Fast Fourier Transform

David Monismithcs599

Page 2: Parallelizing the Fast Fourier Transform David Monismith cs599

Outline

• An Example of a use of the Fast Fourier Transform (FFT) – Audio Processing.

• Explanation of the Discrete Fourier Transform.• Making use of Divide and Conquer to

implement the Recursive FFT (Cooley-Tukey Algorithm).

• Creating an iterative algorithm.• Parallelization of the FFT.

Page 3: Parallelizing the Fast Fourier Transform David Monismith cs599

Examples: Audio Processing• Compression

– In audio and video processing, often only certain frequencies can be heard or seen.

– The FFT or a similar tool can be used to remove frequency data that cannot be seen or heard.

– Only the important data (the frequencies we care about can be stored).

– An inverse FFT (or similar operation) can be applied to decompress the data.

• Audio synthesis– Frequencies can be quickly added/adjusted and converted to a

signal (a sound) by using the FFT.– Such operations are often applied in audio synthesizers.

Page 4: Parallelizing the Fast Fourier Transform David Monismith cs599

Band Pass Filter Algorithm

• Convert signal to the frequency domain with FFT.

• Multiply desired frequencies by 1.• Multiply remaining frequencies by zero.• Apply the inverse FFT to convert the signal

back to the space-time domain.

Page 5: Parallelizing the Fast Fourier Transform David Monismith cs599

Discrete Fourier Transform

• Converts a sequence of values from space-time domain to frequency domain.

• Useful for signal processing.• The standard DFT is too slow for practical use -

it requires O(n2) operations.• Notice that each value is a summation of all the

components in the space-time sequence.

Page 6: Parallelizing the Fast Fourier Transform David Monismith cs599

Discrete Fourier TransformThe nth root of unity is defined as ωn as shown below.

The DFT can be performed as shown in the equation below.

Many values in the matrix are repeated, but the repetition is not obvious.

Page 7: Parallelizing the Fast Fourier Transform David Monismith cs599

Example of Repetition

Assume that N = 5.Notice that the first root of unity is:

The sixth root of unity is

Page 8: Parallelizing the Fast Fourier Transform David Monismith cs599

Fast Fourier Transform (FFT)

• Repetition in the roots of unity is highest for the DFT when using array sizes of powers of two.

• This can be taken advantage of with a divide and conquer algorithm called the FFT.

• The FFT and can compute a DFT in O(n lg n) time by multiplying array values by the appropriate root of unity and adding the array value at an appropriate stride.

• This algorithm allows for fast compression and manipulation of signals.

Page 9: Parallelizing the Fast Fourier Transform David Monismith cs599

Divide and Conquer in the FFT

x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]

s1[0] s1[1] s1[2] s1[3] s1[4] s1[5] s1[6] s1[7]

s2[0] s2[1] s2[2] s2[3] s2[4] s2[5] s2[6] s2[7]

X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]

Page 10: Parallelizing the Fast Fourier Transform David Monismith cs599

Cooley-Tukey FFT Algorithmy = fft(x, n, stride) if(n == 1) y[0] = x[0]; else y1 = fft(x, n/2, 2*stride); y2 = fft(x+stride, n/2, 2*stride);

for(int i = 0; i < n/2; i++) y[i] = y1[i] + e^((2*PI*I*i)/n)*y2[i]; y[i+n/2] = y1[i] + e^((2*PI*I*(i+n/2))/n)*y2[i]; end for end ifend fft

Page 11: Parallelizing the Fast Fourier Transform David Monismith cs599

Example Code

• Let’s quickly take a look at some example code for the Recursive FFT.

• Given an array of values:[ 8, 7.0, 9.0, -1.3, 6.3, 8.5, 4.2, 9.1, -5.2 ]• Matlab (and Octave) tell us that the fft is:[37.60 + 0.00*i, -6.24 + 1.13*i, 7.70 + 12.10*i, 3.24 + 21.93*i, 9.00 + 0.00*i, 3.24 + -21.93*i, 7.70 + -12.10*i, -6.24 + -1.13*i]

Page 12: Parallelizing the Fast Fourier Transform David Monismith cs599

Iterative FFTy = fft(x, n) { r = ceil(log2(n)); //Allocate arrays R and S to be of size n R = x;

for(m = 0; m < r; m++) { S[i] = R[i];//Elements to add at each stage differ in exactly one bit. bit = 1 << (r - m - 1); notBit = ~bit; for(i = 0; i < n; i++) { j = i & notbit; k = i | bit; expFactor = revAndShift(i, r, m); R[i] = S[j] + S[k] * cexp( (2*PI*I*expFactor)/n ); } } y = R;}

Page 13: Parallelizing the Fast Fourier Transform David Monismith cs599

Reverse and Shift Function//Given i = b0 b1 b2 … br-1 obtain bm bm-1 … b0 0 … 0//Note that there are r - m - 1 zeros.result = revAndShift(unsigned int i, int r, int m) { i = i >> (r-m-1); //remove unwanted bits result = 0; for(int j = 0; j < m+1; j++) { result |= i & 1; i = i >> 1; if(j < m) result = result << 1; }

//pad result with zeros result = result << (r-m-1);}

Page 14: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallelizing the FFT

• Assume the number of processes (i.e. running programs) is a power of 2.

• The number of processes will be referred to as npes.

• Assume the array size (N) is a power of 2 and is larger than the number of processes.

• Partition the array into N/npes chunks.• Assign one chunk to each process.

Page 15: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallelizing the FFT

x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7]

s1[0] s1[1] s1[2] s1[3] s1[4] s1[5] s1[6] s1[7]

s2[0] s2[1] s2[2] s2[3] s2[4] s2[5] s2[6] s2[7]

X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]

Process 0 Process 1 Process 2 Process 3

Page 16: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallelizing the FFT

• Notice that data must be sent to each process and received by each process in the first lg(npes) stages.

• Additionally, notice that within each process, in a stage where data transfer must occur, data must be sent to a process and received from the same process.

• This operation can be accomplished using a function called MPI_Sendrecv from the Message Passing Interface API.

• We will investigate the algorithm to perform this operation next.

Page 17: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallel Algorithm//Rank is the process id and npes is the //number of processesy = fft(x, n, rank, npes) { r = ceil(log2(n)); workToDo = n/npes; start = rank*workToDo; end = start + workToDo;//Allocate arrays S, Sk, and R of size workToDo

R = x[start…end-1];

for(int m = 0; m < r; m++) { Sk = S = R; bit = 1 << (r - m - 1), notbit = ~bit; splitPoint = npes / (1 << (m+1));

Page 18: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallel FFT Algorithm, Cont’d

if(splitPoint > 0) { if( ( rank % (splitPoint << 1) ) < splitPoint) Send S to process rank + splitPoint, and receive Sk from rank + splitPoint. else Send Sk to process rank - splitPoint, and receive S from rank – splitPoint. } else Sk = S;

Page 19: Parallelizing the Fast Fourier Transform David Monismith cs599

Parallel FFT Algorithm, Cont’d

for(int i = start, l = 0; l < workToDo; i++, l++) { j = (i & notbit) % workToDo; k = (i | bit) % workToDo; expFactor = revAndShift(i, r, m); R[l] = S[j] + Sk[k] * e^( (2*PI*I*expFactor)/n ); } } y = R;}

Page 20: Parallelizing the Fast Fourier Transform David Monismith cs599

References

• [1] A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing 2nd Edition, 2003

• [2] J. Demmel, Fast Fourier Transform Lecture, Efficient Algorithms and Intractable Problems, Spring 2007, http://www.cs.berkeley.edu/~demmel/cs170_spr07/LectureNotes/Lecture_FFT.pdf

• [3] Discrete Fourier Transform, http://en.wikipedia.org/wiki/Discrete_Fourier_transform

• http://en.wikipedia.org/wiki/Fast_Fourier_transform• http://en.wikipedia.org/wiki/Gilbert_Strang• Additive Synthesis,

http://en.wikipedia.org/wiki/Additive_synthesis