View
213
Download
0
Tags:
Embed Size (px)
Citation preview
The FFT on a GPU
Graphics Hardware 2003
July 27, 2003
Kenneth Moreland Edward AngelSandia National Labs U. of New Mexico
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration
under contract DE-AC04-94AL85000.
Graphics Hardware 20032
Overview
• Introduction– Motivation, FFT review.
• FFT Techniques– Exploitable FFT properties.
• Implementation• Results
– Performance, applications, conclusions.
Graphics Hardware 20033
• The Fourier transform is a principal tool for digital image processing.– Filtering.
– Correction.
– Compression.
– Classification.
– Generation.
• As such, should not our graphics hardware support such a tool?
Motivation
Graphics Hardware 20034
The Discrete Fourier Transform
• Converts data in the spatial or temporal domain into frequencies the data comprise.
1
0
1 N
x
uxNWxfN
uFxfF
1
0
1N
u
uxNWuFxfuFF
NjN eW 2
Graphics Hardware 20035
The Discrete Fourier Transform
• 2D transform can be computed by applying the transform in one direction, then the other.
1
0
1
0
,1
,,N
y
M
x
vyN
uxMWWyxf
MNvuFyxfF
1
0
1
0
1 ,,,N
v
M
u
vyN
uxM WWvuFyxfvuFF
DFT
IDFT
Graphics Hardware 20036
The Fast Fourier Transform
• Divide and Conquer Algorithm– Input sequence is divided into subsequences
consisting of values from even and odd indices, respectively.
uFWuFuF uN
oe
xfxf 2e 12o xfxf
Graphics Hardware 20037
Index Magic
• Do not use recursion.– Use dynamic programming: iterate over entire array
computing all values for each recursive depth together, like mergesort.
• Indexing is non-obvious.– Unlike mergesort, recursive step does not divide
array into contiguous chunks.
– At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions?
Graphics Hardware 20038
Index Magic
• Common solution: rearrange data by reversing the bits of indices.– FFT can occur with contiguous partitions.
– Requires an extra data copy.
• Our solution, determine indexing in place.
iii
uiii NNunAWNunAnA i 222 121
Note that the paper has a typo.
iNnu 2 div
Graphics Hardware 20039
Fourier Symmetry of Real Sequences
• In general, the frequency spectra of even real functions contain imaginary values.– Captures magnitude and phase shift of sinusoids.
• Brute force FFT doubles computation and storage costs.
• But, Fourier transforms of real functions have symmetry.–
– Values at and are real (because they are conjugates with themselves).
uNFuFu *, 0F 2
NF
Graphics Hardware 200310
Fourier Transform of Real Functions
• Pick two functions, let them be f(x) and g(x).
• Let h(x) = f(x) + j g(x).– Note that there is no loss of
information.• Can perform FFT of h in half the
time as performing the brute force FFT of f and g individually.– Simply point to one row of
image as real components and another as imaginary components.
f
g
Graphics Hardware 200311
Untangling Fourier Transform Pairs
• Fourier transform is linear.– H(u) = F(u) + j G(u)
• We can “untangle” using symmetry of F and G.– Add and subtract H(u) and H(N – u) to cancel out
conjugate terms of F and G.
II
RR
2j2
j22
j
uGuFuNHuH
uGuF
uNGuGuNFuFuNHuH
Graphics Hardware 200312
Untangling Fourier Transform Pairs
RR2
1I
II21
R
II21
I
RR21
R
uNHuHuG
uNHuHuG
uNHuHuF
uNHuHuF
Graphics Hardware 200313
Packing Transforms of Real Functions
• We can store Fourier transform in an array the same size as the input.– Throw away
conjugate duplicates.
– Throw away imaginary values known to be zero.
0 1N 2N 12 N12 N 1N
Real Values Imaginary Values
Graphics Hardware 200314
Column-wise FFT
• We have two columns with real values.– Use same “tangled”
approach.
• All other columns are complex numbers.– Use regular FFT.
Real Real
Paired forComplex
Graphics Hardware 200315
Packing 2D Transforms of Real Functions
• Rows transformed from complex values are already packed appropriately.
• The two rows transformed from real values are untangled and packed to follow suite. 0
0
1M 2M 12 M12 M 1M
1
12 N
2N
12 N
1N
Real Values Imaginary Values
Graphics Hardware 200316
Available Resources
• nVidia GeForce FX 5800 Ultra.– Full 32-bit floating point pipeline and frame buffers.
– Fully programmable vertex and fragment units.
• Cg– High level language for vertex and fragment
programs.
• Traditional CPU: 1.7 GHz Intel Zeon– Freely available high performance FFT
implementations.
Graphics Hardware 200317
Implementation
• Using a SIMD model for parallel computation.– Draw quadrilateral parallel to screen.
– Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral.
– Inputs/output dependent on location of pixel the fragment program is running.
• We require many rendering passes.– Use “render to texture” extension.
– Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation.
Graphics Hardware 200318
Implementation
ImaginaryTangled
RealTangled
RealG
RealF
Imag.F
Imag.G
Sca
le
Sca
le
Rea
lU
ntan
gled
Rea
l, T
angl
ed
Imag
., T
angl
ed
Imag
inar
yU
ntan
gled
Scale Scale
R, F
I, F
R, G
I, G
ImaginaryTangled
RealTangled
RealG
RealF
Imag.F
Imag.G
Pas
s
Pas
s
Rea
lU
ntan
gled
Rea
l, T
angl
ed
Imag
., T
angl
ed
Imag
inar
yU
ntan
gled
Pass Pass
R, F
I, F
R, G
I, G
FFT FFTUntangle Untangle
FFT FFTUntangle Untangle
Frequency S
pectraIm
ages
Graphics Hardware 200319
Fragment Programs
• Written in Cg, compiled for GeForce FX.
Program Instructions
Arithmetic Texture
FFT 27 3
Untangle 4 2
Scale 1 1
Tangle 1 2
Pass 0 1
Multiply 66 4
Graphics Hardware 200322
Performance
• Computation speed: 2.5 GigaFLOPS• Texture read rate: 3.4 GB/sec
Image Size Rendering Rate (Hz)
Arithmetic (sec)
Texture Lookup (sec)
10242 0.37 1.9 0.6
5122 1.6 0.44 0.13
2562 6.7 0.09 0.03
1282 25 0.01 0.007
Graphics Hardware 200323
Conclusions
• The Fourier transform on the GPU has many potential applications.
• A well established FFT on the CPU (FFTW) still has an edge over GPU implementation.– Both software and hardware of GPU are first
generations.
– Room for improvement.
Graphics Hardware 200324
Get the Cg Code
• http://www.cgshaders.org ?
• http://www.cs.unm.edu/~kmorel/documents/fftgpu