Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC...

Scientific Computing withIntel Xeon Phi Coprocessors

Andrey VladimirovColfax International

HPC Advisory Council Stanford Conference 2015

Compututing with Xeon Phi Welcome © Colfax International, 2014

Contents

§1 MIC Architecture, Developer’s Perspective§2 Case Studies

Ï Astrophysics (offload story)Ï N-body simulation (offload vs native in a cluster)Ï Finanical Monte Carlo (heterogeneous clustering)Ï Computational fluid dynamics (legacy code)

§3 Colfax Developer Training

Compututing with Xeon Phi Welcome © Colfax International, 2014

§1. MIC Architecture from Developer’sPerspective

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Intel Xeon Phi Coprocessors and the MIC Architecture

PCIe end-point device

High Power efficiency

∼ 1 TFLOP/s in DP

Heterogeneous clustering

For highly parallel applications which reach the scaling limitson Intel Xeon processors

Examples of Solutions with the Intel MIC Architecture

Colfax’s CXP7450 workstation withtwo Intel Xeon Phi coprocessorsxeonphi.com/workstations

Colfax’s CXP9000 server with eightIntel Xeon Phi coprocessorsxeonphi.com/servers

Intel Xeon Phi Coprocessors and the MIC Architecture

≤18 cores/socket ≈3 GHz

2-way hyper-threading

Up to 768 GB of DDR3 RAM

256-bit AVX vectors

57 to 61 cores at ≈1 GHz

4 hardware threads per core

6–16 GB cached GDDR5 RAM

512-bit IMCI vectors

C/C++/Fortran; OpenMP/MPI

Linux OS (on host and on coprocessor)Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Linux µOS on Intel Xeon Phi coprocessors (part of MPSS)user@host% lspci | grep -i "co-processor"06:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)82:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)user@host% sudo service mpss statusmpss is runninguser@host% cat /etc/hosts | grep mic172.31.1.1 host-mic0 mic0172.31.2.1 host-mic1 mic1user@host% ssh mic0user@mic0% cat /proc/cpuinfo | grep proc | tail -n 3processor : 237processor : 238processor : 239user@mic0% ls /amplxe dev home lib64 oldroot proc sbin sys usrbin etc lib linuxrc opt root sep3.10 tmp var

Offload and Native modesExplicit offload mode:

Native mode:

Optimization Areas

Common methods for Intel Xeon CPUs and Intel Xeon Phi coprocessors:

1 Scalar optimization (compiler-friendly practices)

2 Vectorization (must use 16- or 8-wide vectors)

3 Multi-threading (must scale to 100+ threads)

4 Memory access (streaming access or tiling)

5 Communication (offload, MPI traffic control)

Getting Ready for the Future

Knights Landing (KNL) – next generation of Intel MIC architecture

3x the performance of current generation

Available as a stand-alone processor or as a coprocessor

Getting Ready for the Future

The best way to prepare applications for KNL is to optimize them forIntel Xeon Phi coprocessors based on KNC.

§2. Case Studies

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Astrophysical Code HEATCODE: an Offload Story

xeonphi.com/papers/heatcode

Astrophysical Code HEATCODE: an Offload Story

xeonphi.com/papers/heatcode

N-body Simulation: Offload vs Native in a Cluster

xeonphi.com/papers/nbody-basic

Initial Multi-threaded

Vectorizedwith SoA

ScalarTuning

Tiled,Unrolled

5.3140 180

480 520

0.8120

N-Body Simulation Performance

Processor: Intel Xeon E5-2697 v2 Coprocessor: Intel Xeon Phi 7120P

xeonphi.com/papers/sc14

1 2 3 4 8 12 16

, TFLO

Number of Nodes or Coprocessors (P)

92% eff

76% eff

Intel Xeon E5-2697 v2 CPUs (4 nodes)

Intel Xeon Phi 7120P coprocesors (4 per node)N=220 particles (strong scaling)

1 Xeon Phi/node

2 Xeon Phi/node

3 Xeon Phi/node

4 Xeon Phi/node

Xeon Phi,native MPI

Xeon Phi,MPI+Offload

xeonphi.com/papers/sc14

Asian Option Pricing: Heterogeneous Clustering

xeonphi.com/papers/heterogeneous

Computational Fluid Dynamics: Legacy Code

§3. Colfax Developer Training

Colfax Developer Training

Intel Xeon Phi Coprocessor ProgrammingFuture-Proofing Applications for Knights Landing (KNL)

Free Training for HPCAC Stanford 2015 Participants

xeonphi.com/hpcac2015

Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC...

Documents

Overview of the Intel Xeon and Xeon Phi tecnologies...Overview of the Intel Xeon and Xeon Phi tecnologies V. Ruggiero (v.ruggiero@cineca.it) Roma, 19 July 2017 SuperComputing Applications

Productive parallel programming for intel xeon phi coprocessors

XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications

Intel® Xeon Phi™ Coprocessor Datasheet · Reference Number: 328209-002EN 7 2 Intel® Xeon Phi™ Coprocessor Architecture 2.1 Intel® Xeon Phi™ Coprocessor Product Overview The

Parallel Programming and Optimization with Intel Xeon Phi Coprocessors (2nd edition) - Table of Contents

Intel Xeon Phi – Basic Tutorial

Intel Xeon Phi Programming Models · Xeon Phi Online" Intel have only recently publically unveiled Xeon Phi, and the ﬁrst commercially available cards are being delivered. You can

Xeon Phi™ コプロセッサー - cms-initiative.jp

Optimizing VLPL-S PIC on Intel Xeon & Xeon Phi

Scale from Intel® Xeon® Processor to Intel® Xeon Phi™ Coprocessors

Introduction to Xeon Phi

Xeon Phi - Odd Dwarfs

Intel Xeon Phi Coprocessor

Implementation of Parallel FFTs on Cluster of Intel …...Intel Xeon Phi coprocessors has been presented [Park et al. 2013]. • However, to the best of our knowledge, parallel 1-D

Intel® Xeon Phi™ programming

Introduction to Intel Xeon Phi Coprocessors · Implements Busybox minimal shell environment 5 Programmierung paralleler und verteilter Systeme ... / Fine control m / z-pc ... Auto

Intel Xeon Phi Co-Processorsechow/ipcc/hpc-course/HPC-xeonphi.pdfIntel Xeon Phi Coprocessors Reference: Parallel Programming and Optimization with Intel Xeon Phi Coprocessors, by A

Modernização de código em Xeon® e Xeon Phi™

Программирование для Intel Xeon Phi

Optimizing Commercial Software for Intel Xeon Coprocessors: Lessons Learned