8
SLINGSHOT INTERCONNECT TECHNOLOGY FOR THE EXASCALE ERA

SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

SLINGSHOT

INTERCONNECT TECHNOLOGY FOR THE EXASCALE ERA

Page 2: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking
Page 3: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

OVERVIEWDigital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking completely new questions — and it needs new capabilities to answer them.

The Cray® Shasta™ supercomputing architecture introduces the revolutionary capabilities that will power discovery and innovation for years to come. Fundamental to the Cray Shasta system’s transformational capabilities is the Slingshot™ Ethernet network.

Slingshot is Cray’s eighth major generation of high-performance, scalable network and the next evolution in over 25 years of designing and building HPC networks. It’s the Cray Shasta architecture’s backbone with a host of new features that provide the performance and scalability required for increasingly datacentric HPC and AI applications. The Cray Slingshot network is built around our new 64 port, 12.8 Tb/s switch providing industry leading 200Gb/s connectivity to endpoints. This high radix switch coupled with Cray’s enhanced Dragonfly topology enables scaling to over 250,000 endpoints with a maximum of three switch-to-switch hops between endpoints. It also incorporates a host of new features to ensure packets are routed efficiently and network congestion is avoided. Based on the industry-standard Ethernet protocol, Slingshot enables straightforward connectivity with standard data center environments, third-party storage devices, and can directly exchange IP/Ethernet traffic with the outside world.

Page 4: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

DISCOVER NEW CAPABILITIES

HIGH SPEED, PURPOSE BUILT

WITH A HOST OF NEW FEATURES, SLINGSHOT PROVIDES THE PERFORMANCE, SCALABILITY, AND COST FOR THE MOST

CHALLENGING HPC AND SCALE-OUT ETHERNET APPLICATIONS.

Page 5: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

KEY FEATURES

Industry-leading performance and scalability

100GbE and 200GbE interfaces

High radix, 64-port, 12.8 Gb/s bandwidth switch

Scalability to >250,000 host ports with maximum of 3 hops

Innovative hardware congestion management, adaptive routing, and quality of service

Ethernet standards and protocols, plus optimized HPC functionality

Link level retry and low-latency Forward Error Correction

Standardized, open API management interfaces

Page 6: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

SLINGSHOT

EXASCALE-SIZEDPOSSIBILITIESSLINGSHOT COMPONENTSSlingshot High-Performance Switches:The Cray-designed 64-port switch provides 12.8 Tb/s of bandwidth. Each port operates at 200 Gb/s per direction.

Each port can provide Ethernet edge or optimized fabric functionality. Edge ports connect to supported Ethernet NIC or external routers at 100GE or 200GE. Fabric ports connect Slingshot switches. The Slingshot switch is supported for both the liquid-cooled and air-cooled platforms.

Ethernet NICs: The initial interface for Shasta compute nodes will be Cray-certified 100 Gb/s standards-based Ethernet NICs available for both the liquid- and air-cooled Shasta platforms. Future iimplementations will include Cray-optimized NICs for improved performance and scale.

KEY FEATURESHigh Packet Rate: A high rate of small packets is an increasingly important metric for many applications. Slingshot provides state-of-the-art small packet performance on standard links with more than 1.2 billion packets per second per port (600M packets per second each direction). The Slingshot network delivers high packet rates on IP, messaging, and remote memory access.

Quality of Service: The ability to manage network resources is highly developed in scale-out data centers but absent from many HPC systems. Slingshot provides system-wide quality of service (QoS) classes. With these comes the ability to control how network bandwidth is allocated.

The Slingshot switch for the liquid-cooled Shasta platform comes in a switch blade containing the fabric switch silicon, printed circuit board with connections for compute blades, and all components for cooling and power. The architecture supports up to 8 switch blades per switch chassis and up to 64 switch blades per cabinet.

LIQUID-COOLED SLINGSHOT DETAILS

Liquid cooled External Fabric Ports (QSFP Double Density) to Network

Internal Fabric Connectors to Compute Blades

Page 7: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

Congestion Management: Slingshot employs a revolutionary congestion management mechanism that provides strong performance isolation between workloads, limiting the impact of poorly behaved applications on production traffic and system services. Congestion management is performed automatically in hardware, quickly identifying sources of congestion and limiting their ingress, while allowing other traffic to continue flowing. The mechanism is stable, quick to converge, robust across a wide variety of traffic patterns, and suitable for the highly dynamic communication patterns found in today’s HPC and converged workloads.

Low-Diameter Network: Cray pioneered the introduction of low-diameter networks with high global bandwidth. The Cray-invented Dragonfly topology is based on a hierarchy of all-to-all networks. With Slingshot, it can scale to over 250,000 endpoints with a maximum of three switch-to-switch hops between any endpoints. This low-diameter network reduces network equipment, cabling, and power and cooling costs. It also facilitates the use of innovative adaptive routing algorithms that improve application performance.

The Slingshot switch for the air-cooled Shasta platform comes in a TOR 19” 1U standard form factor. It has 64 front-facing ports for cabling.

AIR-COOLED SLINGSHOT DETAILS

Adaptive Routing: Rather than routing traffic along a predetermined path, Slingshot intelligently routes packets dynamically according to load. The route is determined using real-time, global information on load and is gathered by the hardware and distributed among the switches providing an up-to-date picture of which paths to use or avoid. This enhanced approach provides both adaptive routing of ordered flows, and packet-by-packet adaptive routing of traffic that does not require ordering.

Ethernet Standards: The Slingshot network implements a full range of Ethernet standards with hardware support focusing on the data movement protocols (IPv4 and IPv6, in particular). The Cray design separates data and control planes, allowing use of network function virtualization to implement a wide range of upper layer management and routing protocols.

Telemetry: Capturing network statistics is critical to understanding and optimizing performance. Slingshot telemetry data enables analysis of application performance, load, hotspots, and the causes of congestion.

64 200 Gb/s Network Ports (QSFP Double Density Connectors)

Page 8: SLINGSHOT - Cray · OVERVIEW Digital transformation, explosive data growth, and converging HPC and AI workloads have brought us into the Exascale Era. Science is asking

©2019 Cray Inc. All rights reserved. www.cray.com, Cray and the Cray logo are registered trademarks and Shasta and Slingshot are trademarks of Cray, a Hewelett Packard company. All other trademarks mentioned herein are the properties of theirrespective owners. 201901010

WWW.CRAY.COM