24
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

Embed Size (px)

Citation preview

Page 1: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 1

The Exascale Interconnect Technology

Rich Graham – Sr. Solutions Architect

Page 2: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 2

Leading Server and Storage Interconnect Provider

Software

Comprehensive End-to-End 10/40/56Gb/s Ethernet and 56Gb/s InfiniBand Portfolio

ICs Switches/GatewaysAdapter Cards Cables

Scalability, Reliability, Power, Performance

Page 3: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 3

HCA Roadmap of Interconnect Innovations

InfiniHost

World’s first InfiniBand HCA

10Gb/s InfiniBandPCI-X host interface1 million msg/sec

InfiniHost III

World’s first PCIe InfiniBand HCA

20Gb/s InfiniBandPCIe 1.0 2 million msg/sec

ConnectX (1,2,3)

World’s first Virtual Protocol

Interconnect (VPI) Adapter

40Gb/s & 56Gb/s PCIe 2.0, 3.0 x833 million msg/sec

Connect-IB

The Exascale Foundation

2002 2005

June2012

2008-11

Page 4: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 4

A new interconnect architecture for compute intensive applications

World’s fastest server and storage interconnect solution providing 100Gb/s injection bandwidth

Enables unlimited clustering scalability with new Dynamically Connected Transport service

Accelerates compute-intensive and parallel-intensive applications with over 130 million msg/sec

Optimized for multi-tenant environments of 100s of Virtual Machines per server

Announcing Connect-IB: The Exascale Foundation

Page 5: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES -- CONFIDENTIAL -- 5

New innovative transport – Dynamically Connected Transport service• The new transport service combines the best of:

- Reliable Connected Service – transport reliability

- Unreliable Datagram (UD) – no resources reservation• Scale out for unlimited clustering size of compute and storage• Eliminates overhead and reduces memory footprint

CoreDirect Collective Hardware Offloads• Provides ‘state’ to Work Queue Mechanisms for Collective Offloading in HCA• Frees CPU to do meaningful computation in parallel with collective operations

Derived Data Types• Hardware support for non-contiguous ‘strided’ memory access • Scatter/gather optimizations

Connect-IB Advanced HPC Features

New Transport Mechanism for Unlimited Scalability

Page 6: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 6

Dynamically Connected Transport Service

Page 7: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 7

Transport Scalability• RC requires connection per peer – strains resource requirements at large scale

(O(N))• XRC requires connection per remote node – strains resource requirements at

large scale (O(N))

Transport Performance• UD supports only send/receive semantics – no RDMA or Atomic operations

support

Problems The New Capability addresses

Page 8: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 8

Domically Connected (DC) H/W entities• DC Initiator (DCI) - Data source• DC Target (DCT) – Data Destination

Key concept• Reliable communications- Supports RDMA and Atomics

• Single Initiator can send to multiple destinations• Resource footprint scales as:- Application communication patterns

- Single node communication characteristics

Dynamically Connected Transport Service Basics

Page 9: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 9

Communication Time Line – Common Case

Page 10: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 10

COREDirect Enhanced support

Page 11: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 11

Collective communication scalability• For many HPC applications the scalability of such communications determines

application scalability

System noise• Uncoordinated system activity causes the slow down in one process to be

magnified at other processes• Effects increase as the size of the system increases

Collective communication performance

Problems The New Capability addresses

Page 12: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 12

Scalability of Collective Operations

Ideal Algorithm

Impact of System Noise

3

1

2

4

Page 13: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 13

Scalability of Collective Operations

Offloaded Algorithm

Nonblocking Algorithm

- Communication processing

Page 14: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 14

Managed QP progresses a separate counter (instead of by door-bell)

A ‘wait work queue’ entry waits until specified completion queue (QP) reaches specified producer index value

‘Enable tasks’ manage QP’s to be executed by the H/W

Can set receive CQ’s to continue to be active if they overflow• wait events monitor progress

Submit lists of task to multiple QP’s• sufficient to describe collective operations

Can setup a special completion queue to monitor list completion • request CQE from the relevant task

Key Hardware Features

Page 15: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 15

Collective communications Optimizations• Communication pattern involving multiple processes • Optimized collectives involve a communicator-wide data-dependent

communication pattern• Data needs to be manipulated at intermediate stages of a collective operation• Collective operations limit application scalability - For example, system noise

COREDirect – Key Ideas• Create a local description of the communication pattern• Pass the description to the HCA• Manage the collective operation on the network, freeing the CPU to do

meaningful computation• Poll for collective completion

Collective Communication Methodology

Page 16: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 16

Barrier Collective

Page 17: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 17

Alltoall Collective (128 Bytes)

Page 18: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 18

Nonblocking Allgather (Overlap Post-Work-Wait)

Page 19: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 19

Nonblocking Alltoall (Overlap-Wait)

Page 20: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 20

Non-Contiguous Data Type Support

Page 21: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 21

Transfer of non-contiguous data• Often triggers data packing in main memory, adding to the communication

overhead• Increased CPU involvement in communication pre/post-processing

Problems The New Capability addresses

Page 22: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES - MELLANOX CONFIDENTIAL - 22

Combining Contiguous Memory Regions

Page 23: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 23

Supports non-contiguous strided memory access, scatter/gather

x

y

z

Non-Contiguous Memory Access – Regular Access

Page 24: © 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect

© 2012 MELLANOX TECHNOLOGIES 24

THANK YOU