42
Parallel Fourier Transform A Practical Guide Dhrubaditya Mitra Indian Institute of Science, Bangalore, 560012 Parallel Fourier Transform – p.1/29

A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

  • Upload
    lelien

  • View
    226

  • Download
    2

Embed Size (px)

Citation preview

Page 1: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Parallel Fourier TransformA Practical Guide

Dhrubaditya Mitra

Indian Institute of Science, Bangalore, 560012

Parallel Fourier Transform – p.1/29

Page 2: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Outline

Motivation

Serial FFTSerial FFT : Basic AlgorithmFFT of Real DataFFT in d>1LimitationsAdvertising FFTW

Parallel FFT

Applications : Spectral Methods

Parallel Fourier Transform – p.2/29

Page 3: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Motivation

"If you speed up any non-trivial algorithm by a factor ofmillion or so, the world beat a path towards finding usefulapplication for it " -Numreical Recepies.

Convolution and Correlation

Optimal Filtering

Power Spectrum Estimation

Integral Transforms (Wavelet and Fourier)

Spectral Method for solving PDE

Parallel Fourier Transform – p.3/29

Page 4: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

(Continuous) Fourier Transform

(Inverse) Fourier Transform

��� � ��

(Forward) Fourier Transform

� � � � ��

Parallel Fourier Transform – p.4/29

Page 5: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Discrete Fourier Transform

Finite time series, sampled at an interval

� �

Discrete (Forward) Fourier Transform

� �� �

� �� � ��� � �

� � �

with � � �, where �

Discrete (Inverse) Fourier Transform�

� � �

�� � ��� � �

Parallel Fourier Transform – p.5/29

Page 6: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Discrete Fourier Transform(contd)

Discrete Fourier Transform

� � �

� � ��� � �

� � �

Output in "wrap around order"

Naively Order

algorithm !

� �� �

� � � �� � � �

Parallel Fourier Transform – p.6/29

Page 7: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform

� ���� �

� � �

�� ���� �

� � � � �

� � ���� �

� � � � � � � � � � �

�� ���� �

� �� � �

�� ���� �

� �� � � �

��

� �

Parallel Fourier Transform – p.7/29

Page 8: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform (contd)

� ��

� �

for some k

To find Corresponding k : Bit Reversal

Reverse e and o, gives k in binary

Parallel Fourier Transform – p.8/29

Page 9: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform (contd)

� � ��

� � �

� ��

� � �

for some k

To find Corresponding k : Bit Reversal

Reverse e and o, gives k in binary

Parallel Fourier Transform – p.8/29

Page 10: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform (contd)

� � � ��

� � � �

� � � ��

� � � �

for some k

To find Corresponding k : Bit Reversal

Reverse e and o, gives k in binary

Parallel Fourier Transform – p.8/29

Page 11: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform (contd)

for some k

To find Corresponding k : Bit Reversal

Reverse e and o, gives k in binary

Parallel Fourier Transform – p.8/29

Page 12: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Algorithmic Complexity

Let denote the amount of computation needed for anarray of size

!

. Then,

!

Hence

! �

, and

�� �

Parallel Fourier Transform – p.9/29

Page 13: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT:Improvements

Twidle Factor Trick : Evaluation of

involves timeconsuming evaluation of sines and cosines.

Make a look up tableLot of are , or (about

Cache Efficiencyone FLOP is faster than Memory AccessCache Overflow

Parallel Fourier Transform – p.10/29

Page 14: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT:Improvements

Twidle Factor Trick : Evaluation of

involves timeconsuming evaluation of sines and cosines.

Make a look up table

Lot of are , or (about

Cache Efficiencyone FLOP is faster than Memory AccessCache Overflow

Parallel Fourier Transform – p.10/29

Page 15: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT:Improvements

Twidle Factor Trick : Evaluation of

involves timeconsuming evaluation of sines and cosines.

Make a look up tableLot of

are , or (about

Cache Efficiencyone FLOP is faster than Memory AccessCache Overflow

Parallel Fourier Transform – p.10/29

Page 16: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT:Improvements

Twidle Factor Trick : Evaluation of

involves timeconsuming evaluation of sines and cosines.

Make a look up tableLot of

are , or (about

Cache Efficiency

one FLOP is faster than Memory AccessCache Overflow

Parallel Fourier Transform – p.10/29

Page 17: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT:Improvements

Twidle Factor Trick : Evaluation of

involves timeconsuming evaluation of sines and cosines.

Make a look up tableLot of

are , or (about

Cache Efficiencyone FLOP is faster than Memory Access

Cache Overflow

Parallel Fourier Transform – p.10/29

Page 18: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT:Improvements

Twidle Factor Trick : Evaluation of

involves timeconsuming evaluation of sines and cosines.

Make a look up tableLot of

are , or (about

Cache Efficiencyone FLOP is faster than Memory AccessCache Overflow

Parallel Fourier Transform – p.10/29

Page 19: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT: Improvements(contd.)

Efficiency crucially depends on particular systemarchitecture

USE SYSTEM LIBRARIES (ESSL in IBM, DXML inCompaq etc )

Disadvantage : No Portability

Execption : FFTW

Parallel Fourier Transform – p.11/29

Page 20: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT: Real Data

� � �

Calculate only half the output !

Two for the price of one

A different algorithm for Real Data.

In-Place Transform.

Parallel Fourier Transform – p.12/29

Page 21: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT : Data Storage

0 1 2 N-1 N N+1

Array of Real Data, of size N. Last two elements (N,N+1)are zero.

Parallel Fourier Transform – p.13/29

Page 22: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT : Data Storage

0 1 2 N-1 N N+1

Array of Real Data, of size N. Last two elements (N,N+1)are zero.

-N/2 0 N/2

Fourier Transform of Real Data. Complex data No arrayelement is zero. But

Parallel Fourier Transform – p.13/29

Page 23: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT : Data Storage

0 1 2 N-1 N N+1

Array of Real Data, of size N. Last two elements (N,N+1)are zero.

0 N/2

Only the positive frequency part of complex data.

Parallel Fourier Transform – p.13/29

Page 24: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT : Data Storage

0 1 2 N-1 N N+1

Array of Real Data, of size N. Last two elements (N,N+1)are zero.

0 N/2

Only the positive frequency part of complex data, with realand imaginary part explicitly shown.

Parallel Fourier Transform – p.13/29

Page 25: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT : Data Storage

0 1 2 N-1 N N+1

Array of Real Data, of size N. Last two elements (N,N+1)are zero.

0 N/2

Only the positive frequency part of complex data, with realand imaginary part explicitly shown.

zero-padding

Imaginary part and is zero.

Parallel Fourier Transform – p.13/29

Page 26: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Serial FFT : 2-d

essentially two transforms in two directions.

Computational complexity :

output in ’wrap around order’

Real Input Datazero-padding in the first direction. (for Fortran)Output in the second direction in wrap aroundorder.

Parallel Fourier Transform – p.14/29

Page 27: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform : Limitations

N should be power of small primes.

For prime N algorithms are slower.

Aliasing Error.

Parallel Fourier Transform – p.15/29

Page 28: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Advertising FFTW (www.fftw.org)

Portable but efficient

Callable from both C and Fortran.

Free !! under GPL

Parallel transform of both Shared and Distributedmemory machines

Parallel Fourier Transform – p.16/29

Page 29: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Is FFT Parallelisable ?

Naive approach to parallelising FFTDivide the data equally among two processors.FFT now involves evaluating bit-reversing whichwill imply lots of communication between theprocessors.communication is much slower than FLOPS (orlocal memory access)

Lessons :multidimensional FFT will be better parallelisablethen 1-dparallelisation of 1-d FFT essentially need newapproach.

Parallel Fourier Transform – p.17/29

Page 30: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Why Parallise FFT ?

Fluid Dyanmics : TurbulenceDirect Numerical Simulation of Navier-Stokesequation, in Spectral Method with

"grid points

requires about GB of memory.

Weather Prediction

Complex Fluids

The Earth Simulator

Parallel Fourier Transform – p.18/29

Page 31: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

The Earth Simulator

Parallel Fourier Transform – p.19/29

Page 32: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

The Earth Simulator

Nodes : 640 processor nodes, each with 8 vectorprocessors.

Each node has 16 GB shared memory. i.e. Eachnode is a shared memory machine.

Communication : 16GB/s full duplex mode.

Total Peak Performance : 40 TFLOPS which is ofcourse only theoritical.

computing paradigm : Both shared and distributedmemory coding.

Parallel Fourier Transform – p.19/29

Page 33: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

2-d Parallel FFT

Divide the array equally among 2 processors, alongthe second direction (fortran).(word of caution : This is non-standard)

Do FFT along the local direction.

Transpose (The time consuming communication step)

Do FFT along the local direction (again).

Transpose Back (Optional step)

Transposing Trick

3-d fft paralleises better than 2-d

Parallel Fourier Transform – p.20/29

Page 34: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Parallel FFT (contd)

FFT of Real data :

Do NOT divide along the first direction(fortran)PESSL data storage is really wierdFFTW is much easier to use, but not so fast.

Bandwidth is more important than latency.

Does NOT scale well.

Code must be tuned to the architecture.

Parallel Fourier Transform – p.21/29

Page 35: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Fast Fourier Transform in ES

Divide the array equally among ProcessorNodes(PN), along 3rd direction(fortran 90).

Use threading (parallel do loops) with 8 arithmaticprocessors(AP) in each PN, to FFT in 1-d. (AutomaticParallelisation was not effective enough !)

Use vectorization and microtasking in each AP.

Peak performance of each AP is 8Gflops. Bandwidthbetween AP and (shared) memory is 32GB/s, i.e.processor get only one (double precision) number fortwo flops.But in Radix-2 FFT, memory access:flops = 1:1.i.e. if Radix-2 FFT is used processor will be keptwaiting for data from memory.

Solution : Use Radix-4 FFT.

Data transposition through communication.

Fourier transform in the 3rd direction.

Parallel Fourier Transform – p.22/29

Page 36: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Is ES big enough?

single precision run of

"

DNS. (world record)

double precision run of

"

DNS. (world record)

Is this persual of huge size physically meaningfuly?YES

Even world record simulations gives ReynoldsNumber as small as , whereas experimental datagives about 10 times larger.

The Earth (simulator) is Not Enough

Parallel Fourier Transform – p.23/29

Page 37: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Spectral Method:Burgers Equation

1 dimensional, nonlinear PDE

periodic boundary condition

time stepping not spectral (obvious)

Evaluate Derivatives in Fourier Space

Products in Real Space.

Parallel Fourier Transform – p.24/29

Page 38: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Spectral Method : Virtues

Best spatial derivative possible. Bettern than any finitedifference.

Quite fast (thanks to FFT)

Often you are interested in fourier space quantities forphysical reasons. (e.g. energy spectrum)

Parallel Fourier Transform – p.25/29

Page 39: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Spectral Method : Vices

Difficult to work with anything other than periodicboundary condition.

FFT is not very parallelisable.

Parallel Fourier Transform – p.26/29

Page 40: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Conclusions

Parallel FFT holds the key to huge simulations inalmost any field of research.

Performance of Parallel FFT depends crucially ontuning the code to the architecture of the parallelmachines.

FFTW seems to be the best choice for single nodeFFT.

Even the Earth Simulator is not the limit.

The story of beowulf.

Parallel Fourier Transform – p.27/29

Page 41: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

BEOWULF

Parallel Fourier Transform – p.28/29

Page 42: A Practical Guide - oca.eu · A Practical Guide Dhrubaditya Mitra ... Outline Motivation Serial FFT Serial FFT : Basic Algorithm FFT of Real Data FFT in d>1 ... the world beat a path

Acknowledgements

CSIR, India

Indo-French Centre of Advanced Scientific Research.

Rahul Pandit, Dept of Physics, IISc

Chinmay Das,Pinaki Chaudhuri.

Jasjeet Singh Bagla, MRI, Allahabad

Yanik Ponty, Observatorie of Nice, France.

Parallel Fourier Transform – p.29/29