Upload
silas-barrett
View
221
Download
0
Embed Size (px)
DESCRIPTION
TriBlade ▫Two QS22 blades, each with 2 PowerXCell 8i CPUs ▫LS21 blade with two dual-core AMD Opterons ▫16GB memory for LS21 and 8GB memory for QS22
Citation preview
Programming on IBM Cell TribladeJagan Jayaraj ,Pei-Hung Lin, Mike Knox and Paul WoodwardUniversity of MinnesotaApril 1, 2009
•An instability of an interface between two fluids of different densities, which occurs when the lighter fluid is pushing the heavier fluid.
•Using multi-fluids Piecewise-Parabolic Method(PPM) to implement R-T instability simulation
•Program is written in Fortran
Rayleigh–Taylor instability
TriBlade
▫Two QS22 blades, each with 2 PowerXCell 8i CPUs
▫LS21 blade with two dual-core AMD Opterons
▫16GB memory for LS21 and 8GB memory for QS22
LCSE Cell Cluster•6 Triblades
•4 QS22 Cell blades
•2 QS20 Cell blades
•4 AMD Quadcore Systems
Login instructions•Account credentials should be in your
email.•Guest account: lcse / lcse$ncsa!•Login steps:
▫SSH to frodo.lcse.umn.edu▫Once logged in to frodo SSH to an assigned
Cell Processor host AMD – rra001a ~ rra006a Cell – rra001b / rra001c ~ rra006b/rra006c
Software available•Cell SDK 3.1•OpenMPI 1.3•DaCS Fortran bindings•Compilers
▫AMD: gfortran, gcc 4.1.2▫PPU: ppuxlf, ppu-gcc▫SPU: spuxlf, spu-gcc
•Example code is available on /mnt/scratch/NCSA_Example
Compilation and Execution•On AMD node:
▫make ppm4f-x86
•On Cell node:▫make ppm4f-ppu
•On AMD node:▫./ppm4f-x86
Three levels of parallelism:within-Cell within-node node-to-node
Compute-communication overlapDMADaCSMPI
Triblade programming paradigm
Single code for Roadrunner and non-RR systems◦Using lots #ifdef, #if, #endif…◦Using preprocessor to generate three codes
Minimize the manual translation for SPU code◦Using Fortran to Cell C translator,
Tedious portions of the SPU code can be translated.Fortran codes for PPU and AMD
◦Fortran binding programs for C intrinsic librariesKeep memory footprint small
Programming for IBM Cell Tri-blade
Single Source Code
Preprocessor
PPU Fortran codeSPU Fortran code AMD Fortran code
Translation
SPU C code Fortran Binding Programs
SPU C Compiler
PPU Fortran
Compiler
GNU Fortran
Compiler
AMD ExecutablePPU ExecutableSPU Executable Embedded
Division of labor▫Define jobs for AMD, PPU and SPU clearly
AMD: I/O, MPI, relay data to Cell…
PPU: Transfer data, manage SPUs
SPU: Just compute
▫Three codes for three different ISAs
▫Different endian-ness between PPU and AMD Need to do byte-swapping
▫64bit/32bit conversion SPU supports 32bit address only, but DaCS
requires 64bit address mode
Items to care
Translator•Fortran to C with Cell extensions
•Needs directives
•Built with ANTLR
•Handles:▫Vector and scalar loops▫DMAs (Including List DMAs)▫Variable declarations▫Conditional vector moves
References• Woodward, P. R., J. Jayaraj, P.-H. Lin, and P.-C. Yew, “Moving Scientific Codes to
Multicore Microprocessor CPUs,” Computing in Science & Engineering, special issue on novel architectures, Nov., 2008, p. 16-25. Also available at www.lcse.umn.edu/CiSE.
• Woodward, P. R., J. Jayaraj, P.-H. Lin, and D. Porter, “Programming Techniques for Moving Scientific Simulation Codes to Roadrunner,” tutorial given 3/12/08 at Los Alamos, link available at www.lanl.gov/roadrunner/rrtechnicalseminars2008.
• Woodward, P. R., J. Jayaraj, P.-H. Lin, and W. Dai, “First Experience of Compressible Gas Dynamics Simulationon the Los Alamos Roadrunner Machine,” submitted to Concurrency and Computation Practice and Experience, preprint available at www.lcse.umn.edu/RR-docs.
• http://www.lcse.umn.edu/NCSA_Workshop/