25
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

The FFT on a GPU

Graphics Hardware 2003

July 27, 2003

Kenneth Moreland Edward AngelSandia National Labs U. of New Mexico

Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company,for the United States Department of Energy’s National Nuclear Security Administration

under contract DE-AC04-94AL85000.

Graphics Hardware 20032

Overview

• Introduction– Motivation, FFT review.

• FFT Techniques– Exploitable FFT properties.

• Implementation• Results

– Performance, applications, conclusions.

Graphics Hardware 20033

• The Fourier transform is a principal tool for digital image processing.– Filtering.

– Correction.

– Compression.

– Classification.

– Generation.

• As such, should not our graphics hardware support such a tool?

Motivation

Graphics Hardware 20034

The Discrete Fourier Transform

• Converts data in the spatial or temporal domain into frequencies the data comprise.

1

0

1 N

x

uxNWxfN

uFxfF

1

0

1N

u

uxNWuFxfuFF

NjN eW 2

Graphics Hardware 20035

The Discrete Fourier Transform

• 2D transform can be computed by applying the transform in one direction, then the other.

1

0

1

0

,1

,,N

y

M

x

vyN

uxMWWyxf

MNvuFyxfF

1

0

1

0

1 ,,,N

v

M

u

vyN

uxM WWvuFyxfvuFF

DFT

IDFT

Graphics Hardware 20036

The Fast Fourier Transform

• Divide and Conquer Algorithm– Input sequence is divided into subsequences

consisting of values from even and odd indices, respectively.

uFWuFuF uN

oe

xfxf 2e 12o xfxf

Graphics Hardware 20037

Index Magic

• Do not use recursion.– Use dynamic programming: iterate over entire array

computing all values for each recursive depth together, like mergesort.

• Indexing is non-obvious.– Unlike mergesort, recursive step does not divide

array into contiguous chunks.

– At any iteration, what partition does a given index belong to, and where can one find the applicable values of the sub-partitions?

Graphics Hardware 20038

Index Magic

• Common solution: rearrange data by reversing the bits of indices.– FFT can occur with contiguous partitions.

– Requires an extra data copy.

• Our solution, determine indexing in place.

iii

uiii NNunAWNunAnA i 222 121

Note that the paper has a typo.

iNnu 2 div

Graphics Hardware 20039

Fourier Symmetry of Real Sequences

• In general, the frequency spectra of even real functions contain imaginary values.– Captures magnitude and phase shift of sinusoids.

• Brute force FFT doubles computation and storage costs.

• But, Fourier transforms of real functions have symmetry.–

– Values at and are real (because they are conjugates with themselves).

uNFuFu *, 0F 2

NF

Graphics Hardware 200310

Fourier Transform of Real Functions

• Pick two functions, let them be f(x) and g(x).

• Let h(x) = f(x) + j g(x).– Note that there is no loss of

information.• Can perform FFT of h in half the

time as performing the brute force FFT of f and g individually.– Simply point to one row of

image as real components and another as imaginary components.

f

g

Graphics Hardware 200311

Untangling Fourier Transform Pairs

• Fourier transform is linear.– H(u) = F(u) + j G(u)

• We can “untangle” using symmetry of F and G.– Add and subtract H(u) and H(N – u) to cancel out

conjugate terms of F and G.

II

RR

2j2

j22

j

uGuFuNHuH

uGuF

uNGuGuNFuFuNHuH

Graphics Hardware 200312

Untangling Fourier Transform Pairs

RR2

1I

II21

R

II21

I

RR21

R

uNHuHuG

uNHuHuG

uNHuHuF

uNHuHuF

Graphics Hardware 200313

Packing Transforms of Real Functions

• We can store Fourier transform in an array the same size as the input.– Throw away

conjugate duplicates.

– Throw away imaginary values known to be zero.

0 1N 2N 12 N12 N 1N

Real Values Imaginary Values

Graphics Hardware 200314

Column-wise FFT

• We have two columns with real values.– Use same “tangled”

approach.

• All other columns are complex numbers.– Use regular FFT.

Real Real

Paired forComplex

Graphics Hardware 200315

Packing 2D Transforms of Real Functions

• Rows transformed from complex values are already packed appropriately.

• The two rows transformed from real values are untangled and packed to follow suite. 0

0

1M 2M 12 M12 M 1M

1

12 N

2N

12 N

1N

Real Values Imaginary Values

Graphics Hardware 200316

Available Resources

• nVidia GeForce FX 5800 Ultra.– Full 32-bit floating point pipeline and frame buffers.

– Fully programmable vertex and fragment units.

• Cg– High level language for vertex and fragment

programs.

• Traditional CPU: 1.7 GHz Intel Zeon– Freely available high performance FFT

implementations.

Graphics Hardware 200317

Implementation

• Using a SIMD model for parallel computation.– Draw quadrilateral parallel to screen.

– Rasterizer invokes the same fragment program “in parallel” over all pixels covered by quadrilateral.

– Inputs/output dependent on location of pixel the fragment program is running.

• We require many rendering passes.– Use “render to texture” extension.

– Use two frame buffers: one for retrieving values of last pass and one for storing results of current computation.

Graphics Hardware 200318

Implementation

ImaginaryTangled

RealTangled

RealG

RealF

Imag.F

Imag.G

Sca

le

Sca

le

Rea

lU

ntan

gled

Rea

l, T

angl

ed

Imag

., T

angl

ed

Imag

inar

yU

ntan

gled

Scale Scale

R, F

I, F

R, G

I, G

ImaginaryTangled

RealTangled

RealG

RealF

Imag.F

Imag.G

Pas

s

Pas

s

Rea

lU

ntan

gled

Rea

l, T

angl

ed

Imag

., T

angl

ed

Imag

inar

yU

ntan

gled

Pass Pass

R, F

I, F

R, G

I, G

FFT FFTUntangle Untangle

FFT FFTUntangle Untangle

Frequency S

pectraIm

ages

Graphics Hardware 200319

Fragment Programs

• Written in Cg, compiled for GeForce FX.

Program Instructions

Arithmetic Texture

FFT 27 3

Untangle 4 2

Scale 1 1

Tangle 1 2

Pass 0 1

Multiply 66 4

Graphics Hardware 200320

Applications

• Digital image filtering.

Graphics Hardware 200321

Applications

• Texture generation.

• Volume rendering.

Graphics Hardware 200322

Performance

• Computation speed: 2.5 GigaFLOPS• Texture read rate: 3.4 GB/sec

Image Size Rendering Rate (Hz)

Arithmetic (sec)

Texture Lookup (sec)

10242 0.37 1.9 0.6

5122 1.6 0.44 0.13

2562 6.7 0.09 0.03

1282 25 0.01 0.007

Graphics Hardware 200323

Conclusions

• The Fourier transform on the GPU has many potential applications.

• A well established FFT on the CPU (FFTW) still has an edge over GPU implementation.– Both software and hardware of GPU are first

generations.

– Room for improvement.

Graphics Hardware 200325

Questions?