18
100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009, Málaga, June 2009

100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

Embed Size (px)

Citation preview

Page 1: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

100 Gb/s InfiniBand Transport over up to 100 km

Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009, Málaga, June 2009

Page 2: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.2

Agenda

InfiniBand in Data Centers

InfiniBand Distance Transport

Page 3: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.3

InfiniBand in Data Centers

Page 4: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.4

Connectivity performance

Bandwidth requirements follow Moore’s Law (# transistors on a chip) So far both, Ethernet and InfiniBand outperform Moore’s growth rate

Adapted from: Ishida, O., “Toward Terabit LAN/WAN” Panel, iGRID2005

Moore’s Law

Doubles every 18m

WDM

FC

Ethernet

InfiniBand

Fib

er L

ink

Cap

acity

[b

/s]

100M

1G

10G

100G

1T

10T

1990 1995 2000 2005 2010

2008 2009 2010 2011

10

20

40

80

160

320

640

QDRx1

QDRx4

QDRx12

EDRx1

EDRx4

EDRx12

HDRx1

HDRx4

HDRx12

Time

Ban

dwid

th p

er D

irect

ion

[Gb/

s]

Page 5: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.5

InfiniBand Data Rates

InfiniBand IBx1 IBx4 IBx12

Single Data Rate, SDR 2.5 Gb/s 10 Gb/s 30 Gb/s

Double Data Rate, DDR 5 Gb/s 20 Gb/s 60 Gb/s

Quad Data Rate, QDR 10 Gb/s 40 Gb/s 120 Gb/s

IB uses 8B/10B coding, e.g., IBx1 DDR has 4 Gb/s throughput

Copper

Serial (x1, not much seen on the market)

Parallel copper cables (x4, x12)

Fiber Optic

Serial for x1 and SDR x4 LX (serialized I/F)

Parallel for x4, x12

Page 6: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.6

Converged Architectures

SRP – SCSI RDMA Protocol

Latency

TCP

iSCSI

lossy

FCIP

FCIP

lossy

FCP

iFCP

lossy

FCoE

DCB

lossless

Operating System / Application

Small Computer System Interface (SCSI)

InfiniBand

lossless

Performance

IBEthernet Ethernet Ethernet DCB

IPIPIP

TCPTCP

iFCP

FCP FCPiSCSI SRP

Page 7: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.7

HPC Networks today

ServerCluster

Typical HPC Data Center today

Dedicated networks / technologies for LAN, SAN, CPU (server) interconnect

Consolidation required (management complexity, cables, cost, power)

FC and GbE HBAs and IB HCAs

FC

FC

FCEthernet LANFC SAN

FC

IB

Eth FC

IB

Eth

FC

IB

Eth FC

IB

Eth

Relevant Parameters LAN HBA based on GbE/10GbE SAN HBAs based on 4G/8G-FC HCAs based on IBx4 DDR/QDR

Page 8: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.8

InfiniBand Distance Transport

Page 9: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.9

Generic NREN

DC

DC

DC

DC

DC

DC

DC

DC

DC

Large, dispersed Metro Campus, orCluster of Campuses

DC

Core (Backbone) Router

Large Data Center

Layer-2 Switch

OXC / ROADM

Connection toBackbone (NREN)

Dedicated (P2P) Connection to large Data Centers

Page 10: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.10

InfiniBand-over-DistanceDifficulties and solution considerations

Technical difficulties:

IB-over-copper – limited distance (<15 m)

IB-to-XYZ conversion – high latency

No IB buffer credits in today’s switches for distance transport

High-speed serialization and E-O conversion needed

Requirements:

Lowest latency, hence highest throughput is a must

Interworking must be demonstrated

Page 11: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.11

InfiniBand Flow Control

InfiniBand is credit-based per virtual lane (16)

On initialization, each fabric end-point declares its capacity to receive data

This capacity is described as its buffer credit

As buffers are freed up, end points post messages updating their credit status

InfiniBand flow control happens before transmission, not after it – lossless transport

Optimized for short signal flight time; small buffers are used inside the ICs: Limits effective range to ~300 m

From System Memory

Across IB Link

Into SystemMemoryUpdate Credit1

HCA A HCA B

2

3

4

Page 12: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.12

InfiniBand Throughput vs. Distance

Only sufficient Buffer-to-Buffer credits (B2B credits) in conjunction with error-free

optical transport can ensure maximum InfiniBand performance over distance

Throughput drops significantly after several 10 m w/o additional B2B credits, this is

caused by an inability to keep the pipe full by restoring receive credits fast enough

Buffer credit size depends directly on desired distance

Thro

ugh

put

Distance

w/o B2B Credits

With B2B Credits

Page 13: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.13

DC

DC

InfiniBand-over-Distance Transport

Point-to-point

Typically, <100 km, but can be extended to any arbitrary distance

Low latency (distance!)

Transparent infrastructure (should support other protocols)

LAN

IB HCAs

IB SF

CPU/Server ClusterIB IB

FC

FC

FCSAN

LAN

FC

FC

FCSAN

WDM

WDM…

80 x 10G DWDM (redundant)

Gateway NREN10GbE…100GbE

IB SF – InfiniBand Switch Fabric

Page 14: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.14

IB Transport Demonstrator Results

SendRecV Throughput vs. Message Length

0.2

0.4

0.6

0.8

1

00 1000 2000 3000 4000

Message Length [kB]

Th

rou

gh

put

[GB

/s]

0.4 km

25.4 km

50.4 km

75.4 km

100.4 km

SendRecV Throughput vs. Distance

0.2

0.4

0.6

0.8

1

00 20 40 60 80 100

Distance [km]

Th

rou

gh

put

[GB

/s]

32 kB

128 kB

512 kB

4096 kB

N x 10G InfiniBand Transport over >50 km Distance demonstrated

ADVA FSP 3000 DWDM Up to 80 x 10Gb/s transponders <100 ns latency per transponder Max. reach 200/2000 km

Obsidian Campus C100 4x SDR copper to serial 10G optical 840 ns port-to-port latency Buffer Credits for up to 100 km

(test equipment ready for 50 km)

B2B Credits

SerDes

… 80 x 10G DWDM… DWDM DWDM …

B2B Credits

SerDes

Page 15: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.15

WCA-PC-10G WDM Transponder

Bit rates: 4.25 / 5.0 / 8.5 / 10.0 / 10.3 / 9.95 / 10.5 Gb/s

Applications: IBx1 DDR/QDR, IBx4 SDR, 10GbE WAN/LAN PHY, 4G-/8G-/10G-FC

Dispersion tolerance: up to 100 km w/o compensation

Wavelengths: DWDM (80 channels) and CWDM (4 channels)

Client port: 1 x XFP (850 nm MM, or 1310/1550 nm SM)

Latency <100 ns

Solution Components

Campus C100 InfiniBand Reach Extender

Optical bit rate 10.3 Gb/s (850 nm MM, 1310/1550 nm SM)

InfiniBand bit rate 8 Gb/s (4x SDR v1.2 compliant port)

Buffer credit range up to 100 km (depending on model)

InfiniBand node type: 2-port switch

Small-packet port-to-port latency: 840 ns

Packet forwarding rate: 20 Mp/s

Page 16: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.16

FSP 3000 DWDM System (~100 km, dual-ended)

Chassis, PSUs, Controllers

10G DWDM Modules

Optics (Filters, Amplifiers)

Sum (budgetary)

Solution 8x10G InfiniBand Transport

~€10.000,-

~€100.000,-

~€10.000,-

~€120.000,-

16 x Campus C100 (100 km)

System total (budgetary)

~€300.000,-

~€420.000,-

Page 17: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

© 2009 ADVA Optical Networking. All rights reserved. ADVA confidential.17

An Example…

NASA's largest supercomputer uses 16 Longbow C102 devices to span two buildings, 1.5 km apart, at a link speed of 80 Gb/s and a memory-to-memory latency of just 10 µs.

Page 18: 100 Gb/s InfiniBand Transport over up to 100 km Klaus Grobe and Uli Schlegel, ADVA Optical Networking, and David Southwell, Obsidian Strategics, TNC2009,

Thank you

IMPORTANT NOTICE

The content of this presentation is strictly confidential. ADVA Optical Networking is the exclusive owner or licensee of the content, material, and information in this presentation. Any reproduction, publication or reprint, in whole or in part, is strictly prohibited.

The information in this presentation may not be accurate, complete or up to date, and is provided without warranties or representations of any kind, either express or implied. ADVA Optical Networking shall not be responsible for and disclaims any liability for any loss or damages, including without limitation, direct, indirect, incidental, consequential and special damages, alleged to have been caused by or in connection with using and/or relying on the information contained in this presentation.

Copyright © for the entire content of this presentation: ADVA Optical Networking.

[email protected]