21
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda [email protected] Office Hours: 10-11 MWF or by appointment Tel: 513-556-1807 Meeting: Mondays 6:00-8:50PM Baldwin 645

CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda [email protected] Office Hours: 10-11 MWF

Embed Size (px)

Citation preview

Page 1: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

CS 6068Parallel Computing Fall 2013

Lecture 10 – Nov 18The Parallel FFT

Prof. Fred Annexstein @[email protected]

Office Hours: 10-11 MWF or by appointmentTel: 513-556-1807

Meeting: Mondays 6:00-8:50PM Baldwin 645

Page 2: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Outline

• Fourier analysis• Discrete Fourier transform• Fast Fourier transform• Parallel implementation

Page 3: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Discrete Fourier Transform

• Many applications in science, engineering• Examples– Voice recognition– Image processing

• Straightforward implementation: (n2)• Fast Fourier transform: (n log n)• Parallel FFT (log n)

Page 4: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Fourier Analysis

• Fourier analysis: Represent periodic continuous functions by (potentially infinite) series of sine and cosine functions

• Discrete Fourier transform: Map a sequence over time to another sequence over frequency– Signal strength as a function of time – Fourier coefficients as a function of frequency

Page 5: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

DFT Example (1/4)

16 data points representing signal strength over time

Page 6: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

DFT Example (2/4)

DFT yields amplitudes and frequencies of sine/cosine functions

Page 7: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

DFT Example (3/4)

Plot of four constituent sine/cosine functions and their sum

Page 8: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

DFT Example (4/4)

Continuous function and original 16 samples.

Page 9: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

DFT of Speech Sample

“An gorra cats are furrier...”

Signal

Frequencyand amplitude

Figure courtesy Ron Cole and Yeshwant Muthusamy of the Oregon Graduate Institute

Page 10: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Computing DFT

• Matrix-vector product Fn x

–x is input vector (n signal samples)

–Fn is the nth order Fourier Matrix

–fi,j = nij for 0 i, j < n and n is primitive nth root

of unity

Page 11: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

11

Discrete Fourier Transform• Given a polynomial a0 + a1 x + ... + an-1 xn-1,

evaluate it at n distinct points x0, ... , xn-1.

• Key idea: choose xk = k where is principal nth root of unity.

Page 12: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Example 1

• Compute DFT of vector (2, 3)

• 2, the primitive square root of unity, is -1

Page 13: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Example 2

• Compute DFT of vector (1, 2, 4, 3)• The primitive 4th root of unity is i

Page 14: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

14

Roots of Unity

• Def. An nth root of unity is a complex number x such that xn = 1.

• Fact. The nth roots of unity are: 0, 1, …, n-1 where = e 2 i / n.

• Pf. (k)n = (e 2 i k / n) n = (e i ) 2k = (-1) 2k = 1.

• Fact. The n/2th roots of unity are: 0, 1, …, n/2-1 where = e 4 i / n.

• Fact. 2 = and (2)k = k.

Page 15: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

1

2 = 1 = i

3

4 = 2 = -1

5

6 = 3 = -i

7

n = 80 = 0 = 1

Page 16: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

16

Fast Fourier Transform via Divide and Conquer

• Goal. Evaluate a degree n-1 polynomial A(x) = a0 + ... + an-1 xn-1 at its nth roots of unity: 0, 1, …, n-1.

• Divide. Break polynomial up into even and odd powers.– Aeven(x) = a0 + a2x + a4x2 + … + an/2-2 x(n-1)/2.

– Aodd (x) = a1 + a3x + a5x2 + … + an/2-1 x(n-1)/2.

– A(x) = Aeven(x2) + x Aodd(x2).

• Conquer. Evaluate degree Aeven(x) and Aodd(x) at the ½nth roots of unity: 0, 1, …, n/2-1.

• Combine. – A(k+n) = Aeven(k) + k Aodd(k), 0 k < n/2

– A(k+n) = Aeven(k) - k Aodd(k), 0 k < n/2k+n = -k

k = (k)2 = (k+n)2

Page 17: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

17

fft(n, a0,a1,…,an-1) {

if (n == 1) return a0

(e0,e1,…,en/2-1) FFT(n/2, a0,a2,a4,…,an-2)

(d0,d1,…,dn/2-1) FFT(n/2, a1,a3,a5,…,an-1)

for k = 0 to n/2 - 1 {

k e2ik/n

yk+n/2 ek + k dk

yk+n/2 ek - k dk

}

return (y0,y1,…,yn-1)

}

FFT Algorithm

Page 18: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

18

Odd-Even Recursion Treea0, a1, a2, a3, a4, a5, a6, a7

a1, a3, a5, a7a0, a2, a4, a6

a3, a7a1, a5a0, a4 a2, a6

a0 a4 a2 a6 a1 a5 a3 a7

"bit-reversed" order

000 100 010 110 001 101 011 111

perfect shuffle

Page 19: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Phases of Parallel FFT Algorithm

• Phase 1: Processes permute a’s (global bit reversal data communication pattern)

• Phase 2:– First log n – log p iterations of FFT– Handled in shared memory -No global

communication is required• Phase 3:– Final log p iteration steps must be handled globally– Organized as logical hypercube– In each iteration every process swaps values with

partner across a hypercube dimension

Page 20: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

20

FFT in Practice: Sequential and Parallel

• Fastest Fourier transform in the West. [Frigo and Johnson]– Optimized C library.– Features: DFT, DCT, real, complex, any size, any dimension.– Won 1999 Wilkinson Prize for Numerical Software.– Portable, competitive with vendor-tuned code.

• The NVIDIA CUDA Fast Fourier Transform library (cuFFT) provides a simple interface for computing FFTs up to 10x faster. By using hundreds of processor cores inside NVIDIA GPUs, cuFFT delivers the floating point performance of a ‐GPU without having to develop your own custom GPU FFT.

Reference: http://www.fftw.org

Page 21: CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Annexstein @proffreda fred.annexstein@uc.edu Office Hours: 10-11 MWF

Summary

• Discrete Fourier transform used in many scientific and engineering applications

• Fast Fourier transform important because it implements DFT in time (n log n)

• Developed parallel implementation of FFT• Why isn’t scalability better?– (n log n) sequential algorithm– Parallel version requires bit reversal data exchange– Log n parallel phase steps