24
Future Technologies(WP8) Prototypes Future Technologies (WP8) Prototypes Iris Christadler, Dr. Herbert Huber Leibniz Supercomputing Centre, Germany

“Future TechnologiesFuture Technologies” (WP8) … · 2019. 9. 24. · RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell. x86‐dp

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

  • “Future Technologies” (WP8) PrototypesFuture Technologies (WP8) PrototypesIris Christadler, Dr. Herbert Huber

    Leibniz Supercomputing Centre, Germany

  • Prototype Overview (1/2)Prototype Overview (1/2)CEA“GPU/CAPS”

    1U Tesla Server T1070 (CUDA, CAPS DDT) Intel Harpertown nodes

    Take more easily advantage of accelerators. Compare HMPP with other approaches to program accelerators“GPU/CAPS” CAPS, DDT), Intel Harpertown nodes HMPP with other approaches to program accelerators.

    CINECA I/O Subsystem (SSD, Lustre, pNFS) Assess the applicability of new file system and storage technologies.

    CINES-LRZ“LRB/CS”

    Hybrid SGI ICE/UV/Nehalem-EP & Nehalem-EX/ClearSpeed/Larrabee

    Evaluate a hybrid system architecture containing thin nodes, fat nodes and compute accelerators with a shared file system.

    CSCS“UPC/CAF”

    Prototype PGAS language compilers (CAF + UPC for Cray XT systems)

    Understand the usability and programmability of PGAS languages.

    EPCC“FPGA”

    Maxwell – FPGA prototype (VHDL support & consultancy + software licenses (e.g., Mitrion-C))

    Assess the potential of high-level languages for using FPGAs in HPC. Compare energy efficiency with other solutions.

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 2

  • Prototype Overview (2/2)

    FZJ eQPACE (PowerXCell Gain deep expertise in communication

    Prototype Overview (2/2)

    FZJ“Cell & FPGA interconnect”

    eQPACE (PowerXCell8i cluster with special network processor)

    Gain deep expertise in communication network issues. Extend the application domain of the QPACE system.

    LRZ“RapidMind”

    RapidMind Multi-Core Development Platform (automatic code generation for x86, GPUs and Cell)

    Assess the potential of data stream languages. Compare RapidMind with other approaches for programming accelerators or multi-core systems

    NCF“ClearSpeed”

    ClearSpeed CATS 700 units

    Evaluate ClearSpeed accelerator hardware for large-scale applications.

    Air cooled blade system from SNIC-KTH

    ySupermicro with AMD Istanbul processors & QDR IB(subject to EC approval)

    Evaluate and optimize energy efficiency and packing density ofcommodity hardware.Experiences with the

    prototypes will be reported in Deliverable D8 3 2

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 3

    in Deliverable D8.3.2 [http://www.prace-project.eu/documents/public-deliverables-1/]

  • The teaser

    A SELECTION OF RESULTSThe teaser

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 4

  • RinfRinf

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 5

  • Euroben results - accelerator languagesEuroben results - accelerator languages

    Accelerator Languages (absolute performance)

    94% 81%100000

    1000000

    Accelerator Languages (absolute performance)MKL   (8 Nehalem cores)

    CUDA (1 C1060)

    CellSs (1 PowerXCell8i)79%

    78% v. peak

    10000

    100000 CellSs (1 PowerXCell8i)

    Cn       (1CSX700)

    94

    Accelerator Languages (%peak perf)

    100

    1000

    Mflo

    ps

    94

    3.3

    30

    81

    0 9

    4.5

    79

    2

    78

    610.00

    100.00

    rforman

    ce

    MKL

    mod2f/MKL:single‐threaded only 

    10

    0.9

    0.04

    0.030 01

    0.10

    1.00

    % of p

    eak pe

    r

    CUDA

    CellSs

    Cnmod2f/MKL: single‐threadedonly

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 6

    1

    peak perf mod2am mod2as mod2f

    0.01

    mod2am mod2as mod2f

  • Euroben results - GPGPU languagesEuroben results - GPGPU languages

    100

    Performance Comparison (dense matrix‐matrix mul.) on Nvidia C1060

    70

    80

    90

    100

    50

    60

    70

    Gflo

    ps

    CUDA

    CAPS

    20

    30

    40

    G

    CUDA+MPI 4x4

    RapidMind

    OpenCL

    0

    10 MKL (8cores Nehalem)

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 7matrix size (m)

  • Euroben results - productivityEuroben results - productivity20100000

    Development Time versus Performance (dense matrix-matrix mul.)

    12

    14

    16

    18

    1000

    10000

    me

    in D

    ays

    Mflo

    ps

    *6

    8

    10

    12

    100

    1000

    velo

    pmen

    t Tim

    Perf

    orm

    ance

    in

    Performance

    * *

    **0

    2

    4

    1

    10 DevP Performance

    total time

    first version

    **

    * OpenCL and CUDA+MPI port based on existing CUDA port

    ** RapidMind developer included

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 8

    time for benchmarking

  • First IO-ResultsFirst IO-Results

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 9

  • A glimpse on what you will find in Deliverable D8.3.2

    PROTOTYPESA glimpse on what you will find in Deliverable D8.3.2

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 10

  • eQPACEeQPACEExtend communication capabilities of eQPACE to make

    it suitable for a wider range of applications. Reach a top position in the Green500 list (FZJ).H d P XC ll8i d i h• Hardware: PowerXCell8i processor nodes with custom 3D-torus interconnect. B h k• Benchmarks:HPL, Euroben kernels, torus network benchmarktorus network benchmark,applications & iterative solvers.

    • Programming environments:

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    g gCell SDK & CellSs

    11

  • RapidMindRapidMindEvaluation of the RapidMind programming model (LRZ).

    R idMi d d2

    • Hardware:– CPUs (Nehalem EP, AMD Opteron)

    10203040506070

    Gfops

    RapidMind mod2am

    – GPUs (Nvidia Tesla and Quadro FX)– Cell (QS22-blade cluster)

    • Software:

    010

    matrix size (m)

    • Software:RapidMind allows to write code which can run on x86 cores as well as accelerators like GPUs and Cell.

    x86‐dp (8 cores nehalem) cuda‐dp (c1060) glsl‐sp (FX 5800)

    – Evaluate ease-of-use & portability– Assess RapidMind performance on different architectures

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    – Compare RapidMind with other accelerator languages

    12

  • LRZ-CINESLRZ-CINESEvaluation of a hybrid system architecture containing thin

    nodes, fat nodes and compute accelerators with a shared file system (CINES, LRZ).H d• Hardware:– SGI ICE (Nehalem EP)– SGI UV (Nehalem EX)– SGI UV (Nehalem EX)– Clearspeed CSX700

    • Benchmarks:– Euroben kernels– Synthetic BMs: HPL, Rinf, Intel MPI Benchmark, Apex-MAP

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    – Application BMs: Gadget, Raxml, Specfm3dglobe

    13

  • Hybrid technology demonstratorHybrid technology demonstratorEvaluating GPGPU with CAPS HMPP (CEA).• Hardware:

    Tesla servers connected to B ll i PCI E

    40506070

    ops

    CAPS hmpp mod2am

    Bull servers via PCI-E.• Software:

    CAPS HMPP ll t l it th0

    102030G

    fl

    CAPS HMPP allows to exploit the potential of GPGPUs by simply adding preprocessor directives to

    matrix size (m)

    50

    60

    70

    CUDA mod2am

    adding preprocessor directives to legacy Fortran and C codes.

    0

    10

    20

    30

    40

    50

    Gflo

    ps

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 14matrix size (m)

  • Maxwell FPGAMaxwell FPGAEvaluate the performance and usability of the

    HARWEST Compiling Environment (EPCC).• Hardware: FPGA prototype “Maxwell” (32 FPGAs)

    f b h Al h D L d d N ll h L d ifrom both Alpha Data Ltd and Nallatech Ltd using Virtex-4 FPGAs supplied by Xilinx Corp.B h k• Benchmarks:4 Euroben kernels

    • Languages:• Languages:– VHDL– HCE

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 15

  • PGAS languagesPGAS languagesEvaluate ease of use of PGAS programming model

    (CSCS).• Hardware: Cray XT5• Compiler: Cray Compiler Environment (CCE)• Evaluation of the compiler:

    – Functional correctness– Conformance with language standards

    Usability for existing CAF and UPC benchmarks/applications– Usability for existing CAF and UPC benchmarks/applications

    • Benchmarks from Rice University, George Washington University and the Lawrence Berkley

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    Washington University and the Lawrence Berkley National Laboratory

    16

  • ClearSpeed/PetaPathClearSpeed/PetaPathEvaluate ClearSpeed-Petapath system (NCF).• Hardware:

    114 ClearSpeed CSX700 cards• Language: Cn

    • Benchmarks: – 4 Euroben kernels– 4 Applications

    • Astronomy• Astronomy• Geophysics• numerical mathematics

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    • medical tomography

    17

  • XC4-IOXC4-IO• Compare performances in storage infrastructure

    access, using different hardware configurations and file system architectures. (CINECA).

    Das Bild kann nicht angezeigt werden. Dieser Computer verfügt möglicherweise über zu wenig Arbeitsspeicher, um das Bild zu öffnen, oder das Bild ist beschädigt. Starten Sie den Computer neu, und öffnen Sie dann erneut die Datei. Wenn weiterhin das rote x angezeigt wird, müssen Sie das Bild möglicherweise löschen und dann erneut einfügen.

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 18

  • SNIC-KTHEvaluate energy efficiency of

    SNIC-KTH Preliminary Results (Gromacs)

    high density commodity parts (SNIC-KTH).

    • Hardware: AMD Istanbul• Benchmarks:

    Euroben, STREAM, IMB, Gromacs, CFD• Measure power consumption per component• Adjust fan speed and fan power• Assess energy management features of AMD Istanbul

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    (Control of voltage and frequency of components)19

  • Results will be reported in Deliverable D8.3.2.

    RESEARCH ACTIVITIESResults will be reported in Deliverable D8.3.2.

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 20

  • Parallel GPUParallel GPUEvaluation of GPGPU programming languages (CSC).• Languages

    – CUDA+MPIOpenCL

    GPU-HMMER– OpenCL

    • Benchmarks:– GPU-HMMER– Euroben Kernels

    • Hardware– Tesla– AMD Firestream

    CEA WP8 Prototype

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    – CEA WP8 Prototype

    21

  • Advanced PGAS ProgrammingAdvanced PGAS ProgrammingEvaluate usability of PGAS upc_barrier;upc_forall (sc=0; sc

  • Research on power efficiencyResearch on power efficiencyEvaluate power consumption of components (STFC, PSNC).• Hardware:

    ClearSpeed, Tesla, Firestream, Cell, Power6.• Different workloads:

    stand-by, neutral, real life, artificial stress.• Assess CPU, Memories, Accelerators, HDD’s, cooling fans,

    backplane, power supply.P t ith• Power measurements with:Clamp meters, PDUs with built-in ammeters, values from system management software

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2

    system management software

    23

  • Contact information:Dr. Herbert Huber (WP8 Leader), [email protected] i Ch i t dl (WP8 C L d ) h i t dl @l dIris Christadler (WP8 Co-Leader), [email protected] Supercomputing Centre, Germany

    THANK YOU FOR YOUR ATTENTION!COMMENTS? QUESTIONS?COMMENTS? QUESTIONS?

    SC09, “Future Technologies” (WP8) Prototypes, Outlook Deliverable D8.3.2 24