Upload
alexander-griffith
View
215
Download
0
Embed Size (px)
Citation preview
CS 6068Parallel Computing Fall 2013
Lecture 10 – Nov 18The Parallel FFT
Prof. Fred Annexstein @[email protected]
Office Hours: 10-11 MWF or by appointmentTel: 513-556-1807
Meeting: Mondays 6:00-8:50PM Baldwin 645
Outline
• Fourier analysis• Discrete Fourier transform• Fast Fourier transform• Parallel implementation
Discrete Fourier Transform
• Many applications in science, engineering• Examples– Voice recognition– Image processing
• Straightforward implementation: (n2)• Fast Fourier transform: (n log n)• Parallel FFT (log n)
Fourier Analysis
• Fourier analysis: Represent periodic continuous functions by (potentially infinite) series of sine and cosine functions
• Discrete Fourier transform: Map a sequence over time to another sequence over frequency– Signal strength as a function of time – Fourier coefficients as a function of frequency
DFT Example (1/4)
16 data points representing signal strength over time
DFT Example (2/4)
DFT yields amplitudes and frequencies of sine/cosine functions
DFT Example (3/4)
Plot of four constituent sine/cosine functions and their sum
DFT Example (4/4)
Continuous function and original 16 samples.
DFT of Speech Sample
“An gorra cats are furrier...”
Signal
Frequencyand amplitude
Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute
Computing DFT
• Matrix-vector product Fn x
–x is input vector (n signal samples)
–Fn is the nth order Fourier Matrix
–fi,j = nij for 0 i, j < n and n is primitive nth root
of unity
11
Discrete Fourier Transform• Given a polynomial a0 + a1 x + ... + an-1 xn-1,
evaluate it at n distinct points x0, ... , xn-1.
• Key idea: choose xk = k where is principal nth root of unity.
Example 1
• Compute DFT of vector (2, 3)
• 2, the primitive square root of unity, is -1
Example 2
• Compute DFT of vector (1, 2, 4, 3)• The primitive 4th root of unity is i
14
Roots of Unity
• Def. An nth root of unity is a complex number x such that xn = 1.
• Fact. The nth roots of unity are: 0, 1, …, n-1 where = e 2 i / n.
• Pf. (k)n = (e 2 i k / n) n = (e i ) 2k = (-1) 2k = 1.
• Fact. The n/2th roots of unity are: 0, 1, …, n/2-1 where = e 4 i / n.
• Fact. 2 = and (2)k = k.
1
2 = 1 = i
3
4 = 2 = -1
5
6 = 3 = -i
7
n = 80 = 0 = 1
16
Fast Fourier Transform via Divide and Conquer
• Goal. Evaluate a degree n-1 polynomial A(x) = a0 + ... + an-1 xn-1 at its nth roots of unity: 0, 1, …, n-1.
• Divide. Break polynomial up into even and odd powers.– Aeven(x) = a0 + a2x + a4x2 + … + an/2-2 x(n-1)/2.
– Aodd (x) = a1 + a3x + a5x2 + … + an/2-1 x(n-1)/2.
– A(x) = Aeven(x2) + x Aodd(x2).
• Conquer. Evaluate degree Aeven(x) and Aodd(x) at the ½nth roots of unity: 0, 1, …, n/2-1.
• Combine. – A(k+n) = Aeven(k) + k Aodd(k), 0 k < n/2
– A(k+n) = Aeven(k) - k Aodd(k), 0 k < n/2k+n = -k
k = (k)2 = (k+n)2
17
fft(n, a0,a1,…,an-1) {
if (n == 1) return a0
(e0,e1,…,en/2-1) FFT(n/2, a0,a2,a4,…,an-2)
(d0,d1,…,dn/2-1) FFT(n/2, a1,a3,a5,…,an-1)
for k = 0 to n/2 - 1 {
k e2ik/n
yk+n/2 ek + k dk
yk+n/2 ek - k dk
}
return (y0,y1,…,yn-1)
}
FFT Algorithm
18
Odd-Even Recursion Treea0, a1, a2, a3, a4, a5, a6, a7
a1, a3, a5, a7a0, a2, a4, a6
a3, a7a1, a5a0, a4 a2, a6
a0 a4 a2 a6 a1 a5 a3 a7
"bit-reversed" order
000 100 010 110 001 101 011 111
perfect shuffle
Phases of Parallel FFT Algorithm
• Phase 1: Processes permute a’s (global bit reversal data communication pattern)
• Phase 2:– First log n – log p iterations of FFT– Handled in shared memory -No global
communication is required• Phase 3:– Final log p iteration steps must be handled globally– Organized as logical hypercube– In each iteration every process swaps values with
partner across a hypercube dimension
20
FFT in Practice: Sequential and Parallel
• Fastest Fourier transform in the West. [Frigo and Johnson]– Optimized C library.– Features: DFT, DCT, real, complex, any size, any dimension.– Won 1999 Wilkinson Prize for Numerical Software.– Portable, competitive with vendor-tuned code.
• The NVIDIA CUDA Fast Fourier Transform library (cuFFT) provides a simple interface for computing FFTs up to 10x faster. By using hundreds of processor cores inside NVIDIA GPUs, cuFFT delivers the floating point performance of a ‐GPU without having to develop your own custom GPU FFT.
Reference: http://www.fftw.org
Summary
• Discrete Fourier transform used in many scientific and engineering applications
• Fast Fourier transform important because it implements DFT in time (n log n)
• Developed parallel implementation of FFT• Why isn’t scalability better?– (n log n) sequential algorithm– Parallel version requires bit reversal data exchange– Log n parallel phase steps