36
© ARM 2017 HPC Network Stack on ARM Pavel Shamis (Pasha) Principal Research Engineer ARM Research ExaComm 2017 06/22/2017

Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

Title 44pt sentence case

Affiliations 24pt sentence case

20pt sentence case

© ARM 2017

HPC Network Stack on ARM

Pavel Shamis (Pasha) Principal Research Engineer ARM Research

ExaComm 2017 06/22/2017

Page 2: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 2

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

HPC network stack on ARM ?

Page 3: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 3

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Serious ARM HPC deployments starting in 2017

§  ARM – Emerging CPU architecture in HPC and server space §  Future deployments – Islamabad Cray CS-400, Japan Post-K ARMv8

Page 4: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 4

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

An introduction to ARM

ARM is the world's leading semiconductor intellectual property supplier. We license to over 350 partners, are present in 95% of smart phones, 80% of digital cameras, 35% of all electronic devices, and a total of 60 billion ARM cores have been shipped since 1990.

Our CPU business model: License technology to partners, who use it to create their own system-on-chip (SoC) products. §  We may license an instruction set architecture (ISA) such as

“ARMv8-A”) §  or a specific implementation, such as “Cortex-A72”. §  Partners who license an ISA can create their own

implementation, as long as it passes the compliance tests.

…and our IP extends beyond the CPU

Page 5: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 5

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Range of SoCs addressing infrastructure

Highly Accelerated Massively Multicore

One size does not fit all

QorIQ®Layerscape2080A 

Page 6: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 6

Text 54pt sentence case Integration with Network Interconnects

Page 7: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 7

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

CCIX

§  Accelerators and Network (NIC/HCA/etc.) as a first class “citizen” in the system

§  Seamless process and accelerator hardware cache coherence support

§  Low-latency and high-bandwidth §  Allow in-line acceleration

§  Bump in the wire processing (network packet processing, storage acceleration, etc.)

§  Allows “off-line” acceleration (co-processor model) §  Driver-less / interrupt-less usage model §  http://www.ccixconsortium.com

Page 8: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 8

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Scale-up server node

DMC-620

DMC-620 DMC-620

DMC-620

CoreLink CMN-600

Shared virtual memory system

DMC-620

DMC-620 DMC-620

DMC-620

CoreLink CMN-600

Accelerator

Smart Network

Persistent Memory

Page 9: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 9

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

CCIX multichip connectivity and topologies

§  New class of interconnect providing high performance, low latency for new accelerators use cases §  CCIX defines 25GT/s (3x performance*) §  Examining 56GT/s (7x performance*) and beyond §  Enabling low latency via light transaction layer 

§  Flexible, scalable interconnect topologies §  Flexible point-to-point, daisy chained and switched topologies

§  Simplified deployment by leveraging existing PCIe

hardware and software infrastructure §  Runs on existing PCIe transport layer and management stack §  Coexist with legacy PCIe designs

Compute Node

Accelerator

Smart Network

Persistent Memory

Switch

* Note: Based on PCIe Gen3 Performance

Page 10: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 10

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Building CCIX devices

§  Cadence® IP for CCIX §  Built upon silicon proven PCIe solutions

§  IP products: §  Controller IP – Provides the CCIX transaction and

data link layers. §  PHY IP – Provides the high performance SERDES

physical layer supporting speeds up to 25Gpbs. §  Verification IP – Provides the necessary test

infrastructure to verify CCIX designs.

Cadence Controller & PHY IP

Page 11: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 11

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Cadence CCIX integration

PCIe

Tr

ansa

ctio

n La

yer

CC

IX

Tran

sact

ion

Laye

r

Dat

a Li

nk L

ayer

PH

Y (

up t

o 25

Gpb

s)

16 Lanes

DMC-620

DMC-620 DMC-620

DMC-620

CoreLink CMN-600 XP

CM

L R

NI

CXS

AXI

Cadence IP

Example CMN-600 mesh design Low latency CCIX transaction layer

Support up to 25Gbps vs 16Gbps PCIe Gen4

CML converts CHI to CCIX messages

PCIe IP connects to a CMN IO interface via AXI

Page 12: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 12

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Gen-Z

§  All data is accessed by some form of a Read or a Write §  Example of reads: DDR Row + Column Read, PCI DMA Read, SCSI Write,

Socket Read, File Read, RDMA Read §  Example of writes: DDR Row + Column Write, PCI DMA Write, SCSI Read,

Socket Write, File Write, RDMA Write §  The Goal: Simplify world to memory semantic Reads & Writes

Page 13: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 13

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Gen-Z Overview

§  An open, standards-based, scalable, system interconnect and protocol. §  Optimized to support memory semantic communications §  Breaks Processor-Memory Interlock §  Split controller model

§  Memory controller §  Initiates high-level requests—Read, Write, Atomic, Put / Get, etc. §  Enforces ordering, reliability, path selection, etc.

§  Media controller §  Abstracts memory media §  Supports volatile / non-volatile / mixed-media

§  Performs media-specific operations §  Executes requests and returns responses §  Enables data-centric computing (accelerator, compute, etc.)

Page 14: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 14

Text 54pt sentence case Software Stack Overview

Page 15: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 15

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Linux / FreeBSD w/ AARCH64 support

ß Engaged with FreeBSD foundation / Semi-half & Cavium to get FreeBSD on ARMv8 FreeBSD Beta version demo’d by Semihalf – Nov. 2015

12.04LTS & 14.04LTS released

ß Also 14.10 & 15.04 releases

Red Hat Enterprise Linux Server for ARM 7.2 BETA – Sept, 2015

Fedora 22 released – May 2015 Fedora 23 released – Nov 2015

CentOS Linux 7 for AArch64 GA – August 2015

OpenSUSE 13.2 – Nov 2014 SUSE Launches Partner Program to Bring SUSE Linux Enterprise 12 to 64-bit ARM

July 2015 @ ISC

Debian 8 adds AARCH64 – April 2015

Page 16: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 16

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Open source and commercial compilers

§  LLVM §  C, C++ §  OpenMP 3.1, (4.0 coming soon) §  Fortran coming Q1 2017

§  PathScale §  C, C++ Fortran §  OpenACC §  OpenMP 4.0

§  NAG §  Fortran §  OpenMP 3.1

§  GCC §  C, C++, Fortran §  OpenMP 4.0

§  ARM C/C++ Compiler §  LLVM based §  Includes SVE

Page 17: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 17

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Released

Planned

Concept

2016

Hardware

Open-Source software

ARM HPC tools

ARM HPC ecosystem roadmap

2017 Future

Cavium ThunderX

Allinea DDT and MAP Rogue Wave TotalView

AppliedMicro X-Gene 1 & 2

Fujitsu – Post K (SVE)

ISV software

AMD Seattle

Altair PBS Pro

Cavium ThunderX2

Qualcomm Centriq

NAG Library & Compiler

ARM Optimized Routines

GCC (gcc/g++/gfortran) LLVM - clang LLVM – Flang

ISV software

ARM Code Advisor (Beta) ARM Code Advisor (Full release)

ARM Optimized Routines – vector versions

Phytium Mars

OpenHPC 1.2

ARM Performance Libraries

PathScale ENZO

ARM Fortran Compiler ARM C/C++ Compiler – ahead of LLVM trunk

ARM Instruction Emulator

AppliedMicro X-Gene 3

Page 18: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 18

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

RDMA Networks

§  Remote Direct Memory Access (RDMA) – popular hardware network technology §  InfiniBand – 37% of systems in TOP500

https://www.top500.org

Page 19: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 19

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

RDMA Support

§  Mellanox OFED 2.4 and above supports ARM §  Linux Kernel 4.5.0 and above (maybe even earlier) §  Rdma-Core – runs on ARM §  OFED – No support §  Linux Distribution – on going process

Page 20: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 20

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

OpenUCX v1.2

•  The first official release from OpenUCX community •  https://github.com/openucx/ucx/releases/tag/v1.2.0

•  Features •  Support for InfiniBand and RoCE •  Transports RC, UD, DC

•  Support for Accelerated Verbs – 40% speedup on ARM compared to vanilla Verbs

•  Support for Cray Aries and Gemini •  Support for Shared Memory: KNEM, CMA, XPMEM, Posix, SySV •  Support for x86, ARMv8, Power •  Efficient memory polling – 36% increase in efficiency on ARM •  UCX interface is integrated with MPICH, OpenMPI, OSHMEM, ORNL-

SHMEM, etc.

Pavel Shamis, M. Graham Lopez, and Gilad Shainer. “Enabling One-sided Communication Semantics on ARM“, HIPS 2017

Page 21: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 21

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Programing models

§  Open MPI – compiles and runs on ARMv8 §  Continues integration with HPCAC ARMv8 server

§  MPICH – compiles and runs on ARMv8 §  MVAPICH – compiles (with patches) and runs §  OSHMEM – compiles and runs

§  Continues integration with HPCAC ARMv8 server

Page 22: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 22

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Example: MPI+SHMEM+OpenUCX on InfiniBand

Page 23: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 23

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Lessons Learned

§  Memory Barriers §  Multithread environment §  Software-hardware interaction §  Examples https://github.com/openucx/ucx/blob/master/src/ucs/arch/aarch64/cpu.h#L25 §  You can “fish” for these bugs in MPI implementations around Eager-RDMA and shared

memory protocols Write Payload

Write Notify

Busy-wait Read Notify

Read Payload

Write Barrier Read Barrier

RDMA

RDMA

Maranget, Luc, Susmit Sarkar, and Peter Sewell. "A tutorial introduction to the ARM and POWER relaxed memory models." Draft available from http://www. cl. cam. ac. uk/~ pes20/ppc-supplemental/test7. pdf (2012).

Page 24: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 24

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

More About Barriers

§  There are multiple types of barriers §  DSB

§  Completion semantics §  Interaction with external devices (PCIe doorbells) §  Device drivers

§  DMB §  ISH* domain on Linux §  Poll-flag, barrier, data

§  ISB

Page 25: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 25

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Lessons Learned - continued

§  Low-level timers §  Typically found in benchmarks and MPI §  Code examples https://github.com/openucx/ucx/blob/master/src/ucs/arch/aarch64/cpu.h#L35

Page 26: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 26

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Lessons Learned – continued

§  Not all cache-lines are 64Byte ! §  Implementation dependent §  128Byte and 64Byte

http://xeroxnostalgia.com/duplicators/xerox-9200/

Page 27: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 27

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Optimizations

§  AVX => Neon §  Mostly found around communication request initialization codes

https://github.com/openucx/ucx/blob/891e20ef90257d1e2721da52461b0261220c82d8/src/uct/ib/mlx5/ib_mlx5.inl#L160

§  Busy-wait loop §  See Wait-For-Event (WFE)

Pavel Shamis, M. Graham Lopez, and Gilad Shainer. “Enabling One-sided Communication Semantics on ARM“, HIPS 2017

Page 28: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 28

Text 54pt sentence case Preliminary Results

Page 29: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 29

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Testbed

•  2 x Softiron Overdrive 3000 servers with AMD Opteron A1100 / 2GHz •  ConnectX-4 IB/VPI EDR (PCIe gen2 x8) •  Ubuntu 16.04 •  MOFED 3.3-1.5.0.0 •  UCX [0558b41] •  XPMEM [bdfcc52] •  OSHMEM/OPEN-MPI [fed4849]

Page 30: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 30

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

OpenUCX IB: MLX5 vs Verbs

��

��

��

���

���

���

����

� � � �� �� �� ���

���

���

����

����

����

����

�����

�����

�����

������

������

������

�������

�������

�����������������

��������

��� �������

���� �������� ������� �������� ���

���

���

���

���

���

���

� � � � �� �� �� ���

���

���

����

����������

����������

�����

��������

��� ��� ������� ����

���� �������� ���

���

����

����

����

����

����

����

� � � �� �� �� ���

���

���

����

����

����

����

�����

�����

�����

������

������

������

�������

�������

����

��������

��� ��� ���������

���� �������� ���

���

���

���

���

�����

������

����

��

�������

�����

������

����

��

�������

������������

������������

��� ������ ������ ���������� �����

���������

40%

Page 31: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 31

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

OpenUCX: XPMEM

�����������

�������

����

������

���������

����

� � � �� �� �� ���

���

���

����

����

����

����

�����

�����

�����

������

������

������

�������

�������

�����������������

��������

��� �������

����� �������� ���

�����

�����

�����

�����

�����

�����

�����

�����

� � � � �� �� �� ���

���

���

����

���������������

��������

��� ������� ����

����� �������� ���

����

����

����

����

�����

�����

�����

�����

� � � �� �� �� ���

���

���

����

����

����

����

�����

�����

�����

������

������

������

�������

�������

����

��������

��� ���������

����� �������� ���

����

����

����

����

����

����

����

����

����

���

�����

������

������

�������

�����

������

������

�������

������������

������������

��� ������ ������ ���������� �����

Page 32: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 32

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

SHMEM_WAIT()

���

���

���

��� ���������������������

�������������������

���������������������

���

������� �� ����� ������������� ������������

�����

�����

�������

�����

��� ���������������������

�������������������

����������������

���

����� �������������������� �����

�����

�����

�����

�����

�����

�������

�������

�������

��� ���������������������

�������������������

����������

���

����� �������������� �����

73%

35%

Page 33: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 33

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

OpenSHMEM SSCA

���

���

���

���

���

� � � ��

�������������

���

���� ���������

��� �������� ����

��� �����

7-30%

Page 34: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 34

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

OpenSHMEM GUPs

Pavel Shamis, M. Graham Lopez, and Gilad Shainer. “Enabling One-sided Communication Semantics on ARM“

21%

�������

�������

�������

�������

�������

�������

�������

� � � ��

��������������

����

�����

���

��������� ���� ���������

��� ���� ���������� ����� �������

��� ���� ���������������� ����� �������������

Page 35: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

© ARM 2017 35

Title 40pt sentence case

Bullets 24pt sentence case

Sub-bullets 20pt sentence case

Summary

§  Linux RDMA community is doing great job ! §  A lot of progress was made in ARM HPC/server software

eco-system

Page 36: Pavel Shamis (Pasha)nowlab.cse.ohio-state.edu/static/media/workshops/... · FreeBSD Beta version demo’d by Semihalf – Nov. 2015 12.04LTS & 14.04LTS released ß Also 14.10 & 15.04

The trademarks featured in this presentation are registered and/or unregistered trademarks of ARM Limited (or its subsidiaries) in the US and/or elsewhere. All rights reserved. All other marks featured may be trademarks of their respective owners. Copyright © 2017 ARM Limited © ARM 2017