27
Parallelization of FFT in AFNI Huang, Jingshan Xi, Hong Department of Computer Science and Engineering University of South Carolina

Parallelization of FFT in AFNI

  • Upload
    heidi

  • View
    51

  • Download
    2

Embed Size (px)

DESCRIPTION

Parallelization of FFT in AFNI. Huang, Jingshan Xi, Hong Department of Computer Science and Engineering University of South Carolina. Motivation. AFNI: a widely used software package for medical image processing Drawback: not a real-time system - PowerPoint PPT Presentation

Citation preview

Page 1: Parallelization of FFT in AFNI

Parallelization of FFT in AFNI

Huang, Jingshan Xi, Hong

Department of Computer Science and EngineeringUniversity of South Carolina

Page 2: Parallelization of FFT in AFNI

Motivation

AFNI: a widely used software package for medical image processing

Drawback: not a real-time system

Our goal: make a parallelized version of AFNI

First step: parallelize the FFT part of AFNI

Page 3: Parallelization of FFT in AFNI

Outline

What is AFNI

FFT in AFNI

Introduction of MPI

Our method of parallelization

Experiment result and analysis

Conclusion

Page 4: Parallelization of FFT in AFNI

What is AFNI?

AFNI stands for Analysis of Functional NeuroImages.

It is a set of C programs (over 1,000 source code files) for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity.

AFNI is an interactive program for viewing the results of 3D functional neuroimaging.

Page 5: Parallelization of FFT in AFNI

How to run AFNI?

Log on to clustering machine (daniel.cse.sc.edu)

Go to directory /home/ramsey/newafnigo

Run “afni”

Interface should show up at this time

Page 6: Parallelization of FFT in AFNI

AFNI Interfaces

Page 7: Parallelization of FFT in AFNI

AFNI Interfaces --- Cont.

Page 8: Parallelization of FFT in AFNI

AFNI Interfaces --- Cont.

Page 9: Parallelization of FFT in AFNI

AFNI Interfaces --- Cont.

Axial Sagittal Coronal

Page 10: Parallelization of FFT in AFNI

AFNI Interfaces --- Cont.

Axial Sagittal Coronal

Page 11: Parallelization of FFT in AFNI

AFNI Interfaces --- Cont.

Axial Sagittal Coronal

Page 12: Parallelization of FFT in AFNI

FFT in AFNI

Fast Fourier Transform: a kind of finite FT from discrete time domain to discrete spatial domain

Reduces the number of computations needed for N points from O(N2)to O(NlgN)

Extensively used in AFNI

To parallelize FFT has great significance for AFNI

Page 13: Parallelization of FFT in AFNI

What is MPI?

MPI stands for Message-Passing Interface.

MPI is the most widely used approach to develop a parallel system.

MPI has specified a library of functions that can be called from a C or Fortran program.

The foundation of this library is a small group of functions that can be used to achieve parallelism by message passing.

Page 14: Parallelization of FFT in AFNI

What is Message Passing?

Explicitly transmits data from one process to another

Powerful and very general method of expressing parallelism

Drawback --- “assembly language of parallel computing”

Page 15: Parallelization of FFT in AFNI

What does MPI do for us?

Makes it possible to write libraries of parallel programs that are both portable and efficient

Use of these libraries will hide many of the details of parallel programming

Therefore make parallel computing much more accessible to professionals in all branches of science and engineering

Page 16: Parallelization of FFT in AFNI

Our Objective

To parallelize FFT part of AFNI

In AFNI, when we call FFT function, we are in fact calling the csfft_cox() function, which we will see the detail in next slide

Page 17: Parallelization of FFT in AFNI

Flow Chart of csfft_cox

fft32

fft128

fft2 fft43

fft8 fft16

fft64

fft256

fft512

fft1024

fft2048

fft4096

fft8192

fft16384

fft32768

SCLINV

fft_4dec

return

csfft_cox start

fft_4dec

fft_4dec

fft_4dec

fft_4dec

fft_4dec

3n

5n

fft_3dec

fft_5dec

Page 18: Parallelization of FFT in AFNI

One-level parallelization

There are several options for us to parallel the csfft_cox() function.

At present, we adopt the one-level parallelization method, that is, when fft4096() calls fft1024() and when fft8192() calls fft2048().

Page 19: Parallelization of FFT in AFNI

Correctness of our parallel code

By doing FFT and IFFT consequently, we obtain a set of complex numbers that are almost the same as the ones in the original data file

The only difference comes from the storage error of floating point number (in the original code, such phenomena also exists)

So, what is the speedup then?

Page 20: Parallelization of FFT in AFNI

Two Kinds of Time

There are two kinds of time in analyzing our experiment result: CPU Time and Wall Clock Time (Elapsed Time).

CPU time is the time spent in the calculation part of the code.

Wall Clock Time is the total elapsed time from the user’s point of view.

Page 21: Parallelization of FFT in AFNI

Experiments

Time analysis of Original code (4096 * 200,000 * 1)

starting 200000 FFTs of length 4096 -- 1 at a timeTIME 1

**********************************************************************TIME 1 beginning 0 0.00TIME 1 Abeginning 0.00 u 0.00 s: 0.00 u_t 0.00 s_tTIME 1 Bbeginning 0.00 u 0.00 s: 0.00 u_t 0.00 s_tTIME 1

**********************************************************************Using csfftTIME 2

**********************************************************************TIME 2 ending 0 155.09TIME 2 Aending 155.09 u 30.60 s: 155.09 u_t 30.60 s_tTIME 2 Bending 155.09 u 30.60 s: 155.09 u_t 30.60 s_tTIME 2

**********************************************************************wall clock time = 813.324630813.324630

Page 22: Parallelization of FFT in AFNI

Experiments --- Cont. Time analysis of Parallelized in 2 processors (4096 * 200,000 * 1)starting 200000 FFTs of length 4096 -- 1 at a timeTIME 1 **********************************************************************TIME 1 beginning 0 0.00TIME 1 beginning 1 0.00TIME 1 Abeginning 0.00 u 0.00 s: 0.00 u_t 0.00 s_tTIME 1 Bbeginning 0.00 u 0.00 s: 0.00 u_t 0.00 s_tTIME 1 **********************************************************************Using csfftTIME 2 **********************************************************************TIME 2 ending 0 168.09TIME 2 ending 1 85.66TIME 2 Aending 253.75 u 115.11 s: 253.75 u_t 115.11 s_tTIME 2 Bending 126.87 u 57.55 s: 126.87 u_t 57.55 s_tTIME 2 **********************************************************************

wall clock time = 679.795504679.795504

Page 23: Parallelization of FFT in AFNI

Experiments --- Cont. Time analysis of Parallelized in 4 processors (4096 * 200,000 * 1)

starting 100000 FFTs of length 4096 -- 1 at a timeTIME 1 **********************************************************************TIME 1 beginning 0 0.00TIME 1 beginning 1 0.00TIME 1 beginning 2 0.00TIME 1 beginning 3 0.00TIME 1 Abeginning 0.00 u 0.00 s: 0.00 u_t 0.00 s_tTIME 1 Bbeginning 0.00 u 0.00 s: 0.00 u_t 0.00 s_tTIME 1 **********************************************************************Using csfftTIME 2 **********************************************************************TIME 2 ending 0 139.71TIME 2 ending 1 71.39TIME 2 ending 2 57.29TIME 2 ending 3 61.77TIME 2 Aending 180.16 u 114.53 s: 180.16 u_t 114.53 s_tTIME 2 Bending 45.04 u 28.63 s: 45.04 u_t 28.63 s_tTIME 2 **********************************************************************

wall clock time = 946.5520413946.5520413

Page 24: Parallelization of FFT in AFNI

Analysis of speedup

CPU Time Wall Clock Time

Original Code 155.09 813.324630813.324630

Parallelized in 2 processors

168.09 (rank 0)85.66 (rank 1) 679.795504679.795504

Parallelized in 4 processors

139.71 (rank 0)71.39 (rank 1)57.29 (rank 2)61.77 (rank 3)

946.552041946.552041

Page 25: Parallelization of FFT in AFNI

Analysis of speedup --- Cont.

Two main reasons that we did not obtain the ideal speedup:

1. There exist the competitions among different users in the same CPU.

2. Due to the existing communication cost and some other overhead, it is impossible to obtain the idealspeedup in the real machines.

Page 26: Parallelization of FFT in AFNI

Conclusion

We have parallelized the FFT part of AFNI software package based on MPI. The result shows that for the FFT algorithm itself, we obtain a speedup of around 30 percent.

Increase the speedup of FFTparallelization of 3dDeconvolve program

Page 27: Parallelization of FFT in AFNI

Questions?