15
Vendor Tutorial InfiniBand-over-Distance Transport using Low-Latency WDM Transponders & IB Credit Buffering Christian Illmer ADVA Optical Networking

28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

Embed Size (px)

Citation preview

Page 1: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

Vendor Tutorial

InfiniBand-over-Distance Transport using Low-Latency WDM

Transponders & IB Credit Buffering

Christian IllmerADVA Optical Networking

Page 2: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

InfiniBand-over-distance transport using

low-latency WDM transponders and

IB credit buffering

October, 2008

Page 3: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.3

Connectivity performance

Ishi

da, O

., “T

owar

d Te

rabi

t LA

N/W

AN

”Pan

el, i

GR

ID20

05

Moore’s Law

Doubles every 18m

WDM

FC

Ethernet

InfiniBand

4x12x

4xDDR

12xQDR

Fibe

r Lin

k C

apac

ity [b

/s]

100M

1G

10G

100G

1T

10T

1980 1985 1990 1995 2000 2005 2010

Bandwidth requirements follow Moore’s Law (# transistors on a chip)

So far, InfiniBand outperforms Fibre Channel and also Ethernet regarding bandwidth, and can cope with Moore’s growth rate

Page 4: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.4

InfiniBand data rates

InfiniBand IBx1 IBx4 IBx12

Single Data Rate, SDR 2.5Gbit/s 10Gbit/s 30Gbit/s

Double Data Rate, DDR 5Gbit/s 20Gbit/s 60Gbit/s

Quad Data Rate, QDR 10Gbit/s 40Gbit/s 120Gbit/s

IB uses 8B/10B coding, e.g., IBx1 DDR has 4Gbit/s throughput

Copper

Defined for all data rates and multiplyers

Serial for SDR x1, DDR x1, QDR x1

Parallel copper cables (x4 or x12)

Fiber optic

Defined for all data rates, up to x4

Serial for SDR x1, DDR x1, QDR x1 and SDR x4 LX (serialized I/F)

Parallel for SDR x4 SX

Page 5: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.5

HDD

InfiniBand

Synchronous

Protocols and bit rates

10M 100M 1G 10G 100G

10bT

ETR/CLO

FE

ESCON

STM-1 STM-4

1G-FCFICON

ISC

GbE

Ultra160 SCSI

STM-16

Ultra320 SCSI

2G-FCFICON2ISC3 4G-FC

FICON4

STM-64

10GbE

10G-FC

8G-FC

OTU3

40GbE

IBx4QDR

IBx4SDR

IBx1SDR

100GbE

IBx12QDR

Fibre Channel etc.

EthernetIBx4DDR

Page 6: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.6

CPU connectivity-market

GbEInfiniBand

Myrinet

SP Switch

Proprietary

NUMAlink

Quadrics

Crossbar

Cray

Interconnect

Mixed

0%

10%

20%

30%

40%

50%2006 2007

Market penetration of different CPU interconnect technologies

InfiniBand clearly dominating new high-end DC implementations

TOP 100 Supercomputers 37% in ‘07 50% in ‘08

TOP 100 Supercomputers 37% in ‘0750% in ‘08

Page 7: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.7

HPC networks today

Servercluster

Typical HPC DC today

Dedicated networks / technologies for LAN, SAN, CPU (server) interconnect

Consolidation required

FC and GbE HBAs and IB HCAs

FC

FC

FCEthernet LANFC SAN

FC

IB

Eth FC

IB

Eth

FC

IB

Eth FC

IB

Eth

Relevant parameters

LAN HBA based on GbE/10GbE

SAN HBAs based on 4G/8G-FC

HCAs based on IB(x4) DDR/QDR

Page 8: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.8

Unified InfiniBand architecture

API: Application Programming I/F VAPI: Verbs APISDP: Sockets Direct Protocol TS API: Terminal Server APISRP: SCSI RDMA Protocol uDAPL: User-level Direct-Access Programming LibraryDAT: Direct Access Transport BSD Socket: Berkeley Socket API

VAPI

InfiniBand HCA

FCP

Ethernet

MPI

TS API

SDP TS

SDP TS

TS API SRP

DATSCSI

File System

uDAPL NFS-RDMA FS API

FC

Drivers

BSD Sockets

IPoIB

IP

TCP

BSD Sockets

Ethernet Switch

FC Switch

InfiniBand Switch

Ethernet GW FC GW

LAN/WAN Unified Fabric SAN

Network SANInfiniBand (HPC Messaging etc.)

Page 9: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.9

HPC networks tomorrow?

IPoIB

IB

Gateway

IB

IB IB

IB HCAs

IB SF

Consolidation step: Unified IB switch fabric

IB SF used for CPU cluster and LAN, storage (using IPoIB, SRP and gateways)

LAN now based on IPoIB and Ethernet gateway

Servercluster

IB IB

IB IB Unlikely to be deployed on

a broad scale

Gateway

FC

FC

FCSCSI RDMA protocol

Page 10: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.10

InfiniBand connections over distance

Why is it relevant?Data centers disperse geographically(GRID computing, virtualization, disaster recovery, …)

Native, low-latency IB-over-distance transport was still the missing part

Cluster connectivity via IB-over-WDMWAN protocol is IB, no conversion neededNo additional latencyFully transparent transport

IB switchfabric

Dark fiber

IB server cluster A

IB-over-DWDM IB switchfabric

IB server cluster B

+50km

Page 11: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.11

InfiniBand throughput over distance

What is the solution?IB range extender – credit buffering, low latency, and conversion to 10G optical WDM – lowest latency, transparency, capacity, reach, fiber relief

What are the commercial requirements?Solution must be based on commercial products Interworking capabilities must be demonstrated

Throughput drops significantly after several meters

Only buffer credits (B2B credits) ensure maximum InfiniBand performance over distance

Buffer credit size directly related to distance

Thro

ughput

Distance

w/o B2B credits

with B2B credits

Page 12: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.12

Demonstrator setup at HLRS

Cell cluster

DW

DM

0.4...100.4 kmG.652 SSMF

IBM cluster

Site: HLRS Nobelstrasse Site: HLRS Allmandring

DW

DM

Voltaire ISR2012 Grid Director288 x DDR IBx4 ports11.5 Tb/s backplane<450 ns latency

ADVA FSP 2000 DWDM4 x 10Gbit/s transponders<100 ns link latency

Obsidian Longbow Campus4 x SDR copper to 10G optical2-port switch architecture840 ns port-to-port latency10/40 km reach (Buffer Credits)

Page 13: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

© 2008 ADVA Optical Networking. All rights reserved.13

Demonstrator results

SendRecV Throughput vs. Message Length

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000

Message Length [kB]

Thro

ughp

ut [G

B/s

]

0.4 km

25.4 km

50.4 km

75.4 km

100.4 km

SendRecV Throughput vs. Distance

0

0.2

0.4

0.6

0.8

1

0 20 40 60 80 100

Distance [km]

Thro

ughp

ut

[GB

/s]

32 kB

128 kB

512 kB

4096 kB

The Intel MPI benchmark SendRecV was used

Constant performance up to 50km

Decreasing performance after 50km

Full InfiniBand throughput over more than 50kmFull InfiniBand throughput over more than 50km

Page 14: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

Thank you

[email protected]

Page 15: 28B3 - Adva Optical Networking - Infiniband Vendor Tutorial

Thank you

Danke