Fastpoissonslides Using FFT

  • Upload
    stkegg

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Fastpoissonslides Using FFT

    1/25

    Fast Poisson Solvers and FFT

    Tom Lyche

    University of Oslo

    Norway

    Fast Poisson Solvers and FFT p. 1/

    http://www.ifi.uio.no/http://www.ifi.uio.no/
  • 8/3/2019 Fastpoissonslides Using FFT

    2/25

    Contents of lecture

    Recall the discrete Poisson problem

    Recall methods used so far

    Exact solution by diagonalization

    Exact solution by Fast Sine Transform

    The sine transform

    The Fourier transform

    The Fast Fourier Transform

    The Fast Sine Transform

    Algorithm

    Fast Poisson Solvers and FFT p. 2/

  • 8/3/2019 Fastpoissonslides Using FFT

    3/25

    Recall 5 point-scheme for the Poisson Problem

    (u +u ) =f u=0

    u=0

    u=0

    u=0yyxx

    h = 1/(m + 1) gives n := m2 interior grid points

    Discrete approximation V= [vj,k] Rm,m with vj,k u(jh,kh).Matrix equation TV+ V T = h2F with T = tridiag(

    1, 2,

    1)

    Rm,m

    and F = [f(jh,kh)] Rm,m

    Linear system Ax = b with A = T V+ V T, x = vec(V) Rnand b = h2vec(F) Rn

    Fast Poisson Solvers and FFT p. 3/

  • 8/3/2019 Fastpoissonslides Using FFT

    4/25

    Compare #flops for different methods

    System Ax = b of order n = m2 and bandwidth m = n.full Cholesky: O(n3)

    band Cholesky, block tridiagonal: O(n2)

    If m = 103 then the computing time is

    full Cholesky: years

    band Cholesky and block tridiagonal , : hours

    Derive a method which takes : seconds

    Fast Poisson Solvers and FFT p. 4/

  • 8/3/2019 Fastpoissonslides Using FFT

    5/25

    New exact fast method

    1. Diagonalization of T

    2. Fast Sine transform

    restricted to rectangular domains

    can be extended to 9 point scheme and biharmonic problem

    can be extended to 3D

    Fast Poisson Solvers and FFT p. 5/

  • 8/3/2019 Fastpoissonslides Using FFT

    6/25

  • 8/3/2019 Fastpoissonslides Using FFT

    7/25

  • 8/3/2019 Fastpoissonslides Using FFT

    8/25

    Find Vusing diagonalization

    We define a matrix X by V= SXS, where V is thesolution of TV+ V T = h2F

    TV+ V T = h2F

    V=SXS TSXS+ SXST= h2FS( )S

    STSXS2 + S2XSTS= h2SFS

    TS=SD S2DXS2 + S2XS2D = h2SFSS2=I/(2h)

    DX+XD = 4h4SFS.

    Fast Poisson Solvers and FFT p. 8/

  • 8/3/2019 Fastpoissonslides Using FFT

    9/25

    X is easy to find

    An equation of the form DX+ XD = B for some B iseasy to solve.

    Since D = diag(j) we obtain for each entry

    jxjk + xjkk = bjk

    so xjk = bjk/(j + k) for all j, k.

    Fast Poisson Solvers and FFT p. 9/

  • 8/3/2019 Fastpoissonslides Using FFT

    10/25

    Algorithm

    Algorithm 1 (A Simple Fast Poisson Solver).

    1. h = 1/(m + 1);F =

    f(jh,kh)

    m

    j,k=1;

    S=

    sin(jkh)mj,k=1; =

    sin

    2

    ((jh)/2)mj=1

    2. G = (gj,k) = SFS;

    3. X= (xj,k)mj,k=1, wherexj,k = h

    4gj,k/(j + k);

    4. V= SXS;

    (1)

    Output is the exact solution of the discrete Poisson equation on a

    square computed in O(n3/2) operations.

    Only a couple of mm matrices are required for storage.Next: Use FFT to reduce the complexity to O(n log2 n)

    Fast Poisson Solvers and FFT p. 10/

  • 8/3/2019 Fastpoissonslides Using FFT

    11/25

  • 8/3/2019 Fastpoissonslides Using FFT

    12/25

    The product B = AS

    The product B = AScan also be interpreted as a DST.

    Indeed, since S is symmetric we have B = (SAT)T

    which means that B is the transpose of the DST of the

    rows of A.It follows that we can compute the unknowns V inAlgorithm 1 by carrying out Discrete Sine Transforms on

    4 m-by-m matrices in addition to the computation of X.

    Fast Poisson Solvers and FFT p. 12/

  • 8/3/2019 Fastpoissonslides Using FFT

    13/25

    The Euler formula

    ei = cos + i sin , R, i =1

    ei = cos

    i sin

    cos =ei + ei

    2, sin =

    ei ei2i

    Consider

    N := e2i/N = cos

    2

    N i sin

    2

    N, NN = 1

    Note that

    jk2m+2 = e2jki/(2m+2) = ejkhi = cos(jkh) i sin(jkh).

    Fast Poisson Solvers and FFT p. 13/

  • 8/3/2019 Fastpoissonslides Using FFT

    14/25

    Discrete Fourier Transform (DFT)

    If

    zj =N

    k=1

    (j1)(k1)N yk, j = 1, . . . , N

    then z = [z1, . . . , zN]T is the DFT of y = [y1, . . . , yN]

    T.

    z = FNy, where FN := (j1)(k1)N Nj,k=1, RN,N

    FN is called the Fourier Matrix

    Fast Poisson Solvers and FFT p. 14/

  • 8/3/2019 Fastpoissonslides Using FFT

    15/25

    Example

    4 = exp2i/4 = cos(2 ) i sin(2 ) = i

    F4 =

    1 1 1 1

    1 4 24

    34

    1 24 44

    64

    1 34 64

    94

    =

    1 1 1 1

    1 i 1 i1 1 1 11 i 1 i

    Fast Poisson Solvers and FFT p. 15/

  • 8/3/2019 Fastpoissonslides Using FFT

    16/25

    Connection DST and DFT

    DST of order m can be computed from DFT of orderN = 2m + 2 as follows:

    Lemma 1. Given a positive integerm and a vectorx Rm.Componentk ofS

    mx is equal toi/2 times componentk + 1 of

    F2m+2z where

    z = (0, x1, . . . , xm, 0,xm,xm1, . . . ,x1)T R2m+2.

    In symbols

    (Smx)k =i

    2

    (F2m+2z)k+1 , k = 1, . . . , m .

    Fast Poisson Solvers and FFT p. 16/

  • 8/3/2019 Fastpoissonslides Using FFT

    17/25

    Fast Fourier Transform (FFT)

    Suppose N is even. Express FN in terms of FN/2. Reorder the columnsin FN so that the odd columns appear before the even ones.

    PN := (e1, e3, . . . , eN1, e2, e4, . . . , eN)

    F4 =

    1 1 1 1

    1 i 1 i1 1 1 1

    1 i 1 i

    F4P4 =

    1 1 1 1

    1 1 i i1 1 1 1

    1 1 i i

    .

    F2 = 1 1

    1 2 =

    1 1

    1 1 , D2 = diag(1, 4) =

    1 0

    0 i .

    F4P4 =

    F2 D2F2

    F2D2F2

    Fast Poisson Solvers and FFT p. 17/

  • 8/3/2019 Fastpoissonslides Using FFT

    18/25

    F2m, P2m, Fm and Dm

    Theorem 2. IfN = 2m is even then

    F2mP2m =

    Fm DmFm

    FmDmFm

    , (2)

    where

    Dm = diag(1, N, 2N, . . . ,

    m1N ). (3)

    Fast Poisson Solvers and FFT p. 18/

  • 8/3/2019 Fastpoissonslides Using FFT

    19/25

    Proof

    Fix integers j, k with 0 j,k m 1 and set p = j + 1 and q = k + 1.Since mm = 1,

    2N = m, and

    mN = 1 we find by considering elements in

    the four sub-blocks in turn

    (F2mP2m)p,q = j(2k)N = jkm = (Fm)p,q,

    (F2mP2m)p+m,q = (j+m)(2k)N =

    (j+m)km = (Fm)p,q,

    (F2mP2m)p,q+m = j(2k+1)N =

    jN

    jkm = (DmFm)p,q,

    (F2mP2m)p+m,q+m = (j+m)(2k+1)

    N = j(2k+1)

    N = (DmFm)p,q

    It follows that the four m-by-m blocks of F2mP2m have the required

    structure.

    Fast Poisson Solvers and FFT p. 19/

  • 8/3/2019 Fastpoissonslides Using FFT

    20/25

    The basic step

    Using Theorem 2 we can carry out the DFT as a block multiplication.

    Let y R2m and set w = PT2my = (wT1 ,wT2 )T, where w1,w2 Rm.Then F2my = F2mP2mP

    T2my = F2mP2mw and

    F2mP2mw =

    Fm DmFmFm DmFm

    w1w2

    = q1 + q2q1 q2

    , whereq1 = Fmw1 and q2 = Dm(Fmw2).

    In order to compute F2my we need to compute Fmw1 and Fmw2.

    Note that wT1 = [y1, y3, . . . , yN1], while wT2 = [y2, y4, . . . , yN].

    This follows since wT = [wT1 ,wT2 ] = y

    TP2m and post multiplying a

    vector by P2m moves odd indexed components to the left of all theeven indexed components.

    Fast Poisson Solvers and FFT p. 20/

  • 8/3/2019 Fastpoissonslides Using FFT

    21/25

    Recursive FFT

    Recursive Matlab function when N = 2k

    Algorithm 3. (Recursive FFT )

    function z=fftrec(y)

    n=length(y);

    if n==1 z=y;

    else

    q1=fftrec(y(1:2:n-1));

    q2=exp(-2*pi*i/n). (0:n/2-1).*fftrec(y(2:2:n));z=[q1+q2 q1-q2];

    end

    Such a recursive version of FFT is useful for testing purposes, but is much

    too slow for large problems. A challenge for FFT code writers is to develop

    nonrecursive versions and also to handle efficiently the case where N is

    not a power of two.

    Fast Poisson Solvers and FFT p. 21/

    FFT C l it

  • 8/3/2019 Fastpoissonslides Using FFT

    22/25

    FFT Complexity

    Let xk be the complexity (the number of flops) when N = 2k.

    Since we need two FFTs of order N/2 = 2k1 and a multiplication

    with the diagonal matrix DN/2 it is reasonable to assume that

    xk

    = 2xk

    1+ 2k for some constant independent of k.

    Since x0 = 0 we obtain by induction on k that

    xk = k2k = Nlog2 N.

    This also holds when N is not a power of 2.

    Reasonable implementations of FFT typically have 5

    Fast Poisson Solvers and FFT p. 22/

    I t i FFT

  • 8/3/2019 Fastpoissonslides Using FFT

    23/25

    Improvement using FFT

    The efficiency improvement using the FFT to compute the DFT isimpressive for large N.

    The direct multiplication FNy requires O(8n2) flops since complex

    arithmetic is involved.

    Assuming that the FFT uses 5Nlog2 N flops we find for

    N = 220 106 the ratio

    8N2

    5Nlog2 N 84000.

    Thus if the FFT takes one second of computing time and if the

    computing time is proportional to the number of flops then the direct

    multiplication would take something like 84000 seconds or 23 hours.

    Fast Poisson Solvers and FFT p. 23/

    P i l b d FFT

  • 8/3/2019 Fastpoissonslides Using FFT

    24/25

    Poisson solver based on FFT

    requires O(n log2 n) flops, where n = m2

    is the size of the linearsystem Ax = b. Why?

    4m sine transforms: G1 = SF, G = G1S, V1 = SX, V= V1S.

    A total of 4m FFTs of order 2m + 2 are needed.Since one FFT requires O((2m + 2) log2(2m + 2) flops the 4m FFTs

    amounts to 8m(m + 1) log2(2m + 2) 8m2 log2 m = 4n log2 n,

    This should be compared to the O(8n3/2

    ) flops needed for 4straightforward matrix multiplications with S.

    What is faster will depend on the programming of the FFT and the

    size of the problem.

    Fast Poisson Solvers and FFT p. 24/

    C t th d

  • 8/3/2019 Fastpoissonslides Using FFT

    25/25

    Compare exact methods

    System Ax = b of order n = m2

    .Assume m = 1000 so that n = 106.

    Method # flops Storage T = 109flops

    Full Cholesky 13n3 = 1310

    18 n2 = 1012 10 years

    Band Cholesky 2n2 = 2 1012 n3/2 = 109 1/2 hourBlock Tridiagonal 2n2 = 2 1012 n3/2 = 109 1/2 hourDiagonalization 8n3/2 = 8 109 n = 106 8 secFFT 20n log2 n = 4 108 n = 106 1/2 sec

    Fast Poisson Solvers and FFT p. 25/