Scientific Computing with Intel Xeon Phi Coprocessors · Intel Xeon Phi Coprocessors and the MIC...

Preview:

Citation preview

Scientific Computing withIntel Xeon Phi Coprocessors

Andrey VladimirovColfax International

HPC Advisory Council Stanford Conference 2015

Compututing with Xeon Phi Welcome © Colfax International, 2014

Contents

§1 MIC Architecture, Developer’s Perspective§2 Case Studies

Ï Astrophysics (offload story)Ï N-body simulation (offload vs native in a cluster)Ï Finanical Monte Carlo (heterogeneous clustering)Ï Computational fluid dynamics (legacy code)

§3 Colfax Developer Training

Compututing with Xeon Phi Welcome © Colfax International, 2014

§1. MIC Architecture from Developer’sPerspective

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Intel Xeon Phi Coprocessors and the MIC Architecture

PCIe end-point device

High Power efficiency

∼ 1 TFLOP/s in DP

Heterogeneous clustering

For highly parallel applications which reach the scaling limitson Intel Xeon processors

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Examples of Solutions with the Intel MIC Architecture

Colfax’s CXP7450 workstation withtwo Intel Xeon Phi coprocessorsxeonphi.com/workstations

Colfax’s CXP9000 server with eightIntel Xeon Phi coprocessorsxeonphi.com/servers

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Intel Xeon Phi Coprocessors and the MIC Architecture

≤18 cores/socket ≈3 GHz

2-way hyper-threading

Up to 768 GB of DDR3 RAM

256-bit AVX vectors

57 to 61 cores at ≈1 GHz

4 hardware threads per core

6–16 GB cached GDDR5 RAM

512-bit IMCI vectors

C/C++/Fortran; OpenMP/MPI

Linux OS (on host and on coprocessor)Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Linux µOS on Intel Xeon Phi coprocessors (part of MPSS)user@host% lspci | grep -i "co-processor"06:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)82:00.0 Co-processor: Intel Corporation Xeon Phi coprocessor 3120 series (rev 20)user@host% sudo service mpss statusmpss is runninguser@host% cat /etc/hosts | grep mic172.31.1.1 host-mic0 mic0172.31.2.1 host-mic1 mic1user@host% ssh mic0user@mic0% cat /proc/cpuinfo | grep proc | tail -n 3processor : 237processor : 238processor : 239user@mic0% ls /amplxe dev home lib64 oldroot proc sbin sys usrbin etc lib linuxrc opt root sep3.10 tmp var

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Offload and Native modesExplicit offload mode:

Native mode:

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Optimization Areas

Common methods for Intel Xeon CPUs and Intel Xeon Phi coprocessors:

1 Scalar optimization (compiler-friendly practices)

2 Vectorization (must use 16- or 8-wide vectors)

3 Multi-threading (must scale to 100+ threads)

4 Memory access (streaming access or tiling)

5 Communication (offload, MPI traffic control)

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Getting Ready for the Future

Knights Landing (KNL) – next generation of Intel MIC architecture

3x the performance of current generation

Available as a stand-alone processor or as a coprocessor

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

Getting Ready for the Future

The best way to prepare applications for KNL is to optimize them forIntel Xeon Phi coprocessors based on KNC.

Compututing with Xeon Phi MIC Architecture from Developer’s Perspective © Colfax International, 2014

§2. Case Studies

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Astrophysical Code HEATCODE: an Offload Story

xeonphi.com/papers/heatcode

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Astrophysical Code HEATCODE: an Offload Story

xeonphi.com/papers/heatcode

Compututing with Xeon Phi Case Studies © Colfax International, 2014

N-body Simulation: Offload vs Native in a Cluster

xeonphi.com/papers/nbody-basic

Compututing with Xeon Phi Case Studies © Colfax International, 2014

N-body Simulation: Offload vs Native in a Cluster

Initial Multi-threaded

Vectorizedwith SoA

ScalarTuning

Tiled,Unrolled

0

500

1000

1500

2000

Sin

gle

Prec

isio

n G

FLO

P/s

5.3140 180

480 520

0.8120

220

870

1620

N-Body Simulation Performance

Processor: Intel Xeon E5-2697 v2 Coprocessor: Intel Xeon Phi 7120P

xeonphi.com/papers/sc14

Compututing with Xeon Phi Case Studies © Colfax International, 2014

N-body Simulation: Offload vs Native in a Cluster

 0

 5

 10

 15

 20

 25

 30

 35

 1  2  3  4  8  12  16

Per

form

ance

, TFLO

P/s

Number of Nodes or Coprocessors (P)

92% eff

76% eff

Intel Xeon E5-2697 v2 CPUs (4 nodes)

Intel Xeon Phi 7120P coprocesors (4 per node)N=220 particles (strong scaling)

1 Xeon Phi/node

2 Xeon Phi/node

3 Xeon Phi/node

4 Xeon Phi/node

Xeon Phi,native MPI

Xeon Phi,MPI+Offload

CPU

xeonphi.com/papers/sc14

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Asian Option Pricing: Heterogeneous Clustering

xeonphi.com/papers/heterogeneous

Compututing with Xeon Phi Case Studies © Colfax International, 2014

Computational Fluid Dynamics: Legacy Code

xeonphi.com/papers/shallowCompututing with Xeon Phi Case Studies © Colfax International, 2014

§3. Colfax Developer Training

Compututing with Xeon Phi Colfax Developer Training © Colfax International, 2014

Colfax Developer Training

Intel Xeon Phi Coprocessor ProgrammingFuture-Proofing Applications for Knights Landing (KNL)

xeonphi.com/trainingCompututing with Xeon Phi Colfax Developer Training © Colfax International, 2014

Free Training for HPCAC Stanford 2015 Participants

xeonphi.com/hpcac2015

Compututing with Xeon Phi Colfax Developer Training © Colfax International, 2014

Recommended