Upload
winfred-simpson
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Accelerating Coherent PulsarDe-dispersion on
Graphics Processing Units
byArjun Radhakrishnan
supervised byProf. Michael Inggs
Outline
Graphics Processing Units (GPUs)
Pulsars
Pulsar De-dispersion
Motivation
Implementation
Results
Conclusion & Future Work
Graphics Processing Units
GPUs are massively parallel processors that are present on consumer graphics cards
Generally used to render 3D objects on screen and calculate the colour of pixel to display
Are mass market products due to the video game industry
Performance tracks Moore's Law since the majority of on-chip space is devoted to compute units as opposed to cache on CPUs
*Source: [7]
Why Use GPUs?
Figure 1: Peak floating point performance of NVIDIA GPUs vs Intel CPUs [2]
Pulsars
Highly magnetised, rapidly rotating neutron stars formed after a supernova
Pulsars emit beams of electromagnetic radiation from their magnetic poles
Beams sweep in a circular path called the “lighthouse effect”
Produce periodic pulses when the pulse sweeps Earth
Figure 2: Pulsar Model [3]
Pulsar Dispersion
Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM)
Lower frequency components of the pulse are delayed more than higher frequencies
Pulsar De-dispersion
Pulsar emissions are distorted upon passing through the ionised Interstellar Medium (ISM)
Lower frequency components of the pulse are delayed more than higher frequencies
Correct for the dispersion by shifting the received signal a certain amount
Figure 3: Pulsar De-dispersion [4]
Coherent De-dispersion
Coherent de-dispersion is the most accurate method of removing the dispersion effects of the Interstellar Matter
Preserves amplitude and phase information from the receiving signal
Convolve the voltage signal with the inverse transfer function of the ISM
This transfer function is a function of the Dispersion Measure (DM) of the signal got from models of the galactic electron density
In practice we use the Fast Fourier Transform (FFT) to make the convolution operation a multiplication in the frequency domain and then apply an inverse FFT
Motivation
Why study Pulsars? A major SKA Science driver: Detection of gravitational waves and tests
of strong field relativity; Analysing black holes
GPU acceleration for MeerKAT Large frequency range (Low: 0.5 – 2.5 GHz, High: 8 – 14.5GHz) High bandwidth per polarisation (4GHz final) Large number of channels (16384) >10GB of data per second
Even more important for SKA since precision will be a high priority and data storage is not feasible
Implementation Considerations
Both CPU and GPU were tested with single-precision floating point
A bottleneck for GPU computing is the time taken to send data to it from main memory – minimise as much as possible
Use asynchronous data transfers to hide the latency
Re-calculate rather than copy data across
Use shared memory on the GPU for calculations and store to global memory at the end
Source data file used is fake dual polarisation data generated with a DM of 50pc/cm3 and 100MHz bandwidth centred on 1450MHz
Basic Program Flow
Figure 4: Program flow
Read in Data
HOST
Copy to GPU memory
Initiate GPU Kernel
V(f0) . H-1(f0) V(fn) . H
-1(fn)
Receive de-dispersed signal
Free Memory
Inverse FFT Inverse FFT
Parallel FFT Parallel FFT
DEVICE
Allocate memory on GPU
Begin De-dispersion
V(f1) . H-1(f1)
+
Output Array
Send Data Back to Host
Inverse FFT...
...
Parallel FFT...
+
Results
Figure 5: Left: Overall speedup (5x) Right: Kernel Speedup (12x)
Results
Was able to coherently de-disperse 50MHz on 1 GPU
Used 2 GPUs for the full 100MHz
Scaling across multiple GPUs was linear
Using larger transfer functions was found to increase performance since there was less of an overhead in memory access times
Conclusion
GPUs are significantly faster than CPUs for de-dispersion
Enabled real-time coherent de-dispersion for the dataset used
Coherent de-dispersion of a 100MHz bandwidth signal requires multiple GPUs at present
Faster memory access would greatly improve overall speedup
Currently testing with real undetected pulsar data
Thank You!
Questions?
References
1. D. R. Lorimer and M. Kramer, Handbook of Pulsar Astronomy Cambridge University Press, 2005
2. NVIDIA CUDA Programming Guide
3. D. Manchester, “CSIRO ATNF Pulsar Education Page”
4. Jim Cordes, “The SKA as a Radio Synoptic Survey Telescope: Widefield Surveys for Transients, Pulsars and ETI”, SKA Memo 97
5. John Rowe Animation/Australia Telescope National Facility, CSIRO [Online]. http://www.atnf.csiro.au/research/pulsar/array/gallery.html
6. Cornell University Dept. of Astronomy, “Legacy Pulsars: Homepage” [Online]. http://arecibo.tc.cornell.edu/legacypulsardata/Default.aspx
7. VR-Zone, “The NVIDIA GeForce GTX 280 1GB bare,” [Online]. http://vr-zone.com/articles/nvidia-geforce-gtx-280-preview/5872.html?doc=5872