140
1 COMP680E by M. Hamdi Introduction to Introduction to High-Performance High-Performance Internet Switches Internet Switches and Routers and Routers

COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

1COMP680E by M. Hamdi

Introduction to High-Introduction to High-Performance Internet Performance Internet Switches and RoutersSwitches and Routers

Page 2: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

2COMP680E by M. Hamdi

Network Architecture

Core Routers

EdgeRouters

Access RoutersAccess Routers• • •

• • •

MetropolitanMetropolitan

Access switch

EdgeEdgeswitchswitch

DWDMDWDM

Long Haul NetworkLong Haul Network

Core Routers

10GbE

GbE

1010GbEGbE

10GbE10GbE

Campus / Campus / ResidentialResidential

MetropolitanMetropolitan

CoreCoreCore

Page 3: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

3COMP680E by M. Hamdi

pop

pop

pop po

p

Page 4: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

4COMP680E by M. Hamdi

How the Internet really is: Current Trend

Modems, DSL

SONET/SDHDWDM

Page 5: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

5COMP680E by M. Hamdi

The Internet is a mesh of routers mostly interconnected by (ATM and) SONET (and

DWDM)

TDMTDM

TDMTDM

Circuit switched crossconnects, DWDM

etc.

Page 6: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

6COMP680E by M. Hamdi

Typical (BUT NOT ALL) IP Backbone (Late 1990’s)

SONET/SDHDCS

SONET/SDHDCS

CoreRouter

ATMSwitch

MUX

SONET/SDHADM

CoreRouter

ATMSwitch

MUX

CoreRouter

ATMSwitch

MUX

CoreRouter

ATMSwitch

MUX

SONET/SDHADM

SONET/SDHADM

SONET/SDHADM

Page 7: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

7COMP680E by M. Hamdi

Points of Presence (POPs)

A

B

C

POP1

POP3POP2

POP4 D

E

F

POP5

POP6 POP7POP8

Page 8: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

8COMP680E by M. Hamdi

Where High Performance Routers are Used

R10 R11

R4

R13

R9

R5

R2R1 R6

R3 R7

R12

R16R15

R14

R8

(2.5 Gb/s)

(2.5 Gb/s)(2.5 Gb/s)

(2.5 Gb/s)

Page 9: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

9COMP680E by M. Hamdi

Hierarchical arrangementEnd hosts

(1000s per mux)

Access multiplexer

Core RoutersPOP

POP

POP

Edge Routers

Point of Presence (POP)

POP: Point of Presence. Richly interconnected by mesh of long-haul links.Typically: 40 POPs per national network operator; 10-40 core routers per POP.

10Gb/s “OC192”

Page 10: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

10COMP680E by M. Hamdi

Typical POP Configuration

Backbone routers

Aggregation switches/routers(Edge Switches)

> 50% of high speed interfaces are router-to-router (Core routers)

10G Router-RouterIntra-Office Links

Transport Network

10G WANTransport Links

DWDM/SONETTerminal

Page 11: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

11COMP680E by M. Hamdi

DWDMDWDMRoutersRouters SwitchesSwitches SONETSONET

LAYER 3 LAYER 2 LAYER 1 LAYER 0

Internet FR & ATM SONET DWDMProtocol

LAYER 3 LAYER 2 LAYER 1 LAYER 0

Internet FR & ATM SONET DWDMProtocol

Today’s Network EquipmentToday’s Network Equipment

Page 12: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

12COMP680E by M. Hamdi

Functions in a packet switch

Interconnect scheduling

Route lookup

TTL proces

sing

Buffering

Buffering

QoS schedu

ling

Control plane

Ingress linecard Egress linecardInterconnect

Framing

Framing

Data path

Control path

Scheduling path

Page 13: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

13COMP680E by M. Hamdi

Functions in a circuit switch

Interconnect scheduling

Control plane

Interconnect

Framing

Framing

Ingress linecar

d

Egress linecard

Data path

Control path

Page 14: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

14COMP680E by M. Hamdi

Our emphasis for now is to Our emphasis for now is to look at packet switches (IP, look at packet switches (IP, ATM, Ethernet, framerelay, ATM, Ethernet, framerelay,

etc.)etc.)

Page 15: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

15COMP680E by M. Hamdi

What a Router Looks Like

Cisco GSR 12416 Juniper M160

6ft

19”

2ft

Capacity: 160Gb/sPower: 4.2kW

3ft

2.5ft

19”

Capacity: 80Gb/sPower: 2.6kW

Page 16: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

16COMP680E by M. Hamdi

A Router Chassis

Linecards

Fans/Power

Supplies

Page 17: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

17COMP680E by M. Hamdi

Backplane

• A Circuit Board with connectors for line cards

• High speed electrical traces connecting line cards to fabric

• Usually passive

• Typically 30-layer boards

Page 18: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

18COMP680E by M. Hamdi

Line Card Picture

Page 19: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

19COMP680E by M. Hamdi

What do these two have in common?

Cisco CRS-1

Cisco Catalyst 3750G

Page 20: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

20COMP680E by M. Hamdi

What do these two have in common?

CRS-1 linecard

• 20” x (18”+11”) x 1RU

• 40Gbps, 80MPPS

• State-of-the-art 0.13u silicon

• Full IP routing stack including IPv4 and IPv6 support

• Distributed IOS

• Multi-chassis support

Cat 3750G Switch

• 19” x 16” x 1RU

• 52Gpbs, 78 MPPS

• State-of-the-art 0.13u silicon

• Full IP routing stack including IPv4 and IPv6 support

• Distributed IOS

• Multi-chassis support

Page 21: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

21COMP680E by M. Hamdi

What is different between them?

Cisco CRS-1

Cisco Catalyst 3750G

Page 22: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

22COMP680E by M. Hamdi

A lot…

CRS-1 linecard

• Up to 1024 linecards

• Fully programmable forwarding

• 2M prefix entries and 512K ACLs

• 46Tbps 3-stage switching fabric

• MPLS support

• H-A non-stop routing protocols

Cat 3750G Switch

• Up to 9 stack members

• Hardwired ASIC forwarding

• 11K prefix entries and 1.5K ACLs

• 32Gbps sharedstack ring

• L2 switching support

• Re-startable routing applications

Page 23: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

23COMP680E by M. Hamdi

Other packet switches

Cisco 7500 “edge” routers

Lucent GX550 Core ATM switch

DSL router

Page 24: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

24COMP680E by M. Hamdi

What is Routing?

R3

A

B

C

R1

R2

R4 D

E

FR5

R5F

R3E

R3D

Next HopDestination

DD

Page 25: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

25COMP680E by M. Hamdi

What is Routing?

R3

A

B

C

R1

R2

R4 D

E

FR5

R5F

R3E

R3D

Next HopDestination

D

DDD

16 3241

Data

Options (if any)

Destination Address

Source Address

Header ChecksumProtocolTTL

Fragment OffsetFlagsFragment ID

Total Packet LengthT.ServiceHLenVer

20

byte

s

Page 26: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

26COMP680E by M. Hamdi

What is Routing?

A

B

C

R1

R2

R3

R4 D

E

FR5

Page 27: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

27COMP680E by M. Hamdi

Control Plane“Typically in Software”

Switch (per-packet processing)“Typically in Hardware”

• Switching•Arbitration•Scheduling

• Routing Lookup• Packet Classifier

Routing• Routing table update (OSPF, RIP, IS-IS)• Admission Control• Congestion Control• Reservation

Basic Architectural Elementsof a Router

SwitchingSwitching

Page 28: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

28COMP680E by M. Hamdi

Basic Architectural ComponentsDatapath: per-packet processing

ForwardingDecision

ForwardingDecision

ForwardingDecision

ForwardingTable

ForwardingTable

ForwardingTable

Interconnect

OutputScheduling

1.

2.

3.

Page 29: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

29COMP680E by M. Hamdi

Per-packet processing in a Switch/Router

1. Accept packet arriving on an ingress line.

2. Lookup packet destination address in the forwarding table, to identify outgoing interface(s).

3. Manipulate packet header: e.g., decrement TTL, update header checksum.

4. Send packet to outgoing interface(s).

5. Queue until line is free.

6. Transmit packet onto outgoing line.

Page 30: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

30COMP680E by M. Hamdi

ATM Switch

• Lookup cell VCI/VPI in VC table.• Replace old VCI/VPI with new.• Forward cell to outgoing interface.• Transmit cell onto link.

Page 31: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

31COMP680E by M. Hamdi

Ethernet Switch

• Lookup frame DA in forwarding table.– If known, forward to correct port.

– If unknown, broadcast to all ports.

• Learn SA of incoming frame.• Forward frame to outgoing interface.• Transmit frame onto link.

Page 32: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

32COMP680E by M. Hamdi

IP Router

• Lookup packet DA in forwarding table.– If known, forward to correct port.

– If unknown, drop packet.

• Decrement TTL, update header Cksum.• Forward packet to outgoing interface.• Transmit packet onto link.

Page 33: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

33COMP680E by M. Hamdi

Special per packet/flow processing

• The router can be equipped with additional capabilities to provide special services on a per-packet or per-class basis.

• The router can perform some additional processing on the incoming packets:– Classifying the packet

• IPv4, IPv6, MPLS, ...

– Delivering packets according to a pre-agreed service: Absolute service or relative service (e.g., send a packet within a given deadline, give a packet a better service than another packet (IntServ – DiffServ))

– Filtering packets for security reasons

– Treating multicast packets differently from unicast packets

Page 34: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

34COMP680E by M. Hamdi

Per packet Processing Must be Fast !!!

1. Packet processing must be simple and easy to implement2. Memory access time is the bottleneck

200Mpps × 2 lookups/pkt = 400 Mlookups/sec → 2.5ns per lookup

Year Aggregate Line-rate

Arriving rate of 40B POS packets (Million pkts/sec)

1997 622 Mb/s 1.56

1999 2.5 Gb/s 6.25

2001 10 Gb/s 25

2003 40 Gb/s 100

2006 80 Gb/s 200

Page 35: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

35COMP680E by M. Hamdi

RouteTableCPU Buffer

Memory

LineInterface

MAC

LineInterface

MAC

LineInterface

MAC

Typically <0.5Gb/s aggregate capacity

First Generation Routers

Shared Backplane

Line Interface

CPU

Memory

Page 36: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

36COMP680E by M. Hamdi

Bus-based Router Architectures with Single Processor

• The first generation of IP router• Based on software implementations on a single general-

purpose CPU.• Limitations:

– Serious processing bottleneck in the central processor– Memory intensive operations (e.g. table lookup & data

movements) limits the effectiveness of processor power

– A severe limiting factor to overall router throughput from input/output (I/O) bus

Page 37: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

37COMP680E by M. Hamdi

Second Generation Routers

RouteTableCPU

LineCard

BufferMemory

LineCard

MAC

BufferMemory

LineCard

MAC

BufferMemory

FwdingCache

FwdingCache

FwdingCache

MAC

BufferMemory

Typically <5Gb/s aggregate capacity

Page 38: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

38COMP680E by M. Hamdi

Bus-based Router Architectures with Multiple Processors

• Architectures with Route Caching– Second generation IP routers– Distribute packet forwarding operations– Network interface cards

» Processors» Route caches

– Packets are transmitted once over the shared bus– Limitations:

» The central routing table is a bottleneck at high-speeds» traffic dependent throughput» shared bus is still a bottleneck

Page 39: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

39COMP680E by M. Hamdi

Limitation of IP Packet Forwarding based on Route Caching

• Routing changes invalidate existing cache entries and need re-establishment

• The performance depends on:– a. how big the cache– b. how the cache is maintained– c. what the performance of the slow path is

• Solution:– Using a forwarding database in each network interface

• Benefit:– Performance, Scalability, Network resilience, and Functionality

Page 40: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

40COMP680E by M. Hamdi

Third Generation Routers

LineCard

MAC

LocalBuffer

Memory

CPUCard

LineCard

MAC

LocalBuffer

Memory

Switched Backplane

Line Interface

CPUMem

ory FwdingTable

RoutingTable

FwdingTable

Typically <50Gb/s aggregate capacity

Page 41: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

41COMP680E by M. Hamdi

Switch-based Router Architectures with Fully Distributed Processors

• To avoid bottlenecks:

– Processing power

– Memory bandwidth

– Internal bus bandwidth

• Each network interface is equipped with appropriate processing power and buffer space.

Page 42: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

42COMP680E by M. Hamdi

Fourth Generation Routers/SwitchesOptics inside a router for the first time

Switch Core Linecards

Optical links

100sof metres

0.3 - 10Tb/s routers in development

Page 43: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

43COMP680E by M. Hamdi

Alcatel 7670 RSP Juniper TX8/T640

TX8

ChiaroAvici TSR

Page 44: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

44COMP680E by M. Hamdi

DSLDSL,,FTTHFTTH,,DialDial

Telecommuter

Residential

(G)MPLS based Multi-service Intelligent Packet Backbone Network

IPv6 IX

ISP’s

GGSN

Service POP

SGSN

Dual Stack IPv4-IPv6 Enterprise NetworkDual Stack IPv4-IPv6 Enterprise Network

Dual Stack IPv4-IPv6 DSL/FTTH/Dial access Network

Dual Stack IPv4-IPv6 DSL/FTTH/Dial access Network

Dual Stack IPv4-IPv6 Cable NetworkDual Stack IPv4-IPv6 Cable Network

ISP offering Native IPv6 servicesISP offering Native IPv6 services

CE router

CE router

CE router

PE Router(Service POP)

PE

PE

• One Backbone NetworkOne Backbone Network• Maximizes speed, flexibility and manageability Maximizes speed, flexibility and manageability

Next Gen. Backbone Network Architecture – One backbone, multiple access networks

Page 45: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

45COMP680E by M. Hamdi

Current Generation: Generic Router Architecture

LookupIP Address

UpdateHeader

Header ProcessingData Hdr Data Hdr

~1M prefixesOff-chip DRAM

AddressTable

AddressTable

IP Address Next Hop

QueuePacket

BufferMemoryBuffer

Memory~1M packetsOff-chip DRAM

Page 46: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

46COMP680E by M. Hamdi

Current Generation: Generic Router Architecture (IQ)

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

Data Hdr

Data Hdr

Data Hdr

1

2

N

1

2

N

Data Hdr

Data Hdr

Data Hdr

Scheduler

Page 47: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

47COMP680E by M. Hamdi

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

Data Hdr

Data Hdr

Data Hdr

1

2

N

1

2

N

Current Generation: Generic Router Architecture (OQ)

Page 48: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

48COMP680E by M. Hamdi

Packet Processing

Framing & Maintenance

Physical Layer

Buffer Mgmt & Scheduling

Buffer Mgmt & Scheduling

Lookup Tables

Buffer &State Memory

Buffer &State Memory

Scheduler

Buffered or Bufferless

Fabric(e.g. crossbar,

bus)

OC192c Linecard:~10-30M gates~2Gbits of memory~2 square feet>$10k cost; price $100K

OC192c Linecard:~10-30M gates~2Gbits of memory~2 square feet>$10k cost; price $100K

Typical IP Router LinecardTypical IP Router Linecard

Backplane

Basic Architectural Elementsof a Current Router

Page 49: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

49COMP680E by M. Hamdi

Performance metrics

1. Capacity– “maximize C, s.t. volume < 2m3 and power < 5kW”

2. Throughput– Operators like to maximize usage of expensive long-

haul links.

3. Controllable Delay– Some users would like predictable delay.

– This is feasible with output-queueing plus weighted fair queueing (WFQ).

WFQ( , ) ( , )

Page 50: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

50COMP680E by M. Hamdi

Why do we Need Faster Routers?

1. To prevent routers from becoming the bottleneck in the Internet.

2. To increase POP capacity, and to reduce cost, size and power.

Page 51: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

51COMP680E by M. Hamdi

Why we Need Faster Routers To prevent routers from being the bottleneck

1

10

100

1,000

10,000

100,000

1,000,000

1980 1983 1986 1989 1992 1995 1998 2001

Nor

mal

ized

Gro

wth

sin

ce 1

980

DRAM Random Access Time1.1x / 18months

Moore’s Law2x / 18 months

Router Capacity2.2x / 18months

Line Capacity2x / 7 months

User Traffic2x / 12months

Page 52: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

52COMP680E by M. Hamdi

Why we Need Faster Routers 1: To prevent routers from being the bottleneck

0

100

200

300

400

500

600

2003 2006 2009 2012

Nor

maliz

ed g

row

th

5-folddisparity

traffic

Routercapacity

Disparity between traffic and router growth

Page 53: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

53COMP680E by M. Hamdi

POP with smaller routersPOP with large routers

• Interfaces: Price >$200k, Power > 400W

• About 50-60% of interfaces are used for interconnection within the POP.

• Industry trend is towards large, single router per POP.

• Big POPs need big routers

Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs

Page 54: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

54COMP680E by M. Hamdi

A Case study: UUNET Internet Backbone Build Up

1999 View (4Q)

•8 OC-48 links between POPs (not parallel)

2000 View (4Q)

• 52 OC-48 links between POPs: many parallel links

• 3 OC-192 Super POP links: multiple parallel interfaces between POPs (D.C. – Chicago; NYC – D.C.)

To Meet the traffic growth, Higher Performance Routers with Higher Port Speed, are required

Page 55: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

55COMP680E by M. Hamdi

Why we Need Faster Routers 2: To reduce cost, power & complexity of POPs

D S L A M L 3 / 4

S w i t c h

D i r e c t

C o n n e c t s

C M T S

D S L A M L 3 / 4

S w i t c h

D i r e c t

C o n n e c t s

C M T S

D S L A M L 3 / 4

S w i t c h

D i r e c t

C o n n e c t s

C M T S

Further Reduces CapEx, Operational costFurther increases network stability

Page 56: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

56COMP680E by M. Hamdi

Ideal POPIdeal POP

CARRIER OPTICAL

TRANSPORT

Existing Carrier

Equipment

Existing Carrier

Equipment

SONET

VoIP Gateways

Cable ModemAggregation

Gigabit Ethernet

Digital SubscriberLine Aggregation

Gigabit Routers

ATM

SONET

VoIP Gateways

Cable ModemAggregation

Gigabit Ethernet

Digital SubscriberLine Aggregation

Gigabit Routers

ATM

Existing Carrier Equipment

Existing Carrier Equipment

DWDM and OPTICAL

SWITCHES

DWDM and OPTICAL

SWITCHES

Page 57: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

57COMP680E by M. Hamdi

Why are Fast Routers Difficult to Make?

1. Big disparity between line rates and memory access speed

1

10

100

1,000

10,000

100,000

1,000,000

Nor

mal

ized

Gro

wth

Rat

e

Page 58: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

58COMP680E by M. Hamdi

Problem: Fast Packet Buffers

Example: 40Gb/s packet bufferSize = RTT*BW = 10Gb; 64 byte packets

Write Rate, R

1 packetevery 12.8 ns

Read Rate, R

1 packetevery 12.8 ns

BufferManager

BufferMemory

Use SRAM?+ fast enough random access time, but

- too low density to store 10Gb of data.

Use SRAM?+ fast enough random access time, but

- too low density to store 10Gb of data.

Use DRAM?+ high density means we can store data, but- too slow (50ns random access time).

Use DRAM?+ high density means we can store data, but- too slow (50ns random access time).

Page 59: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

59COMP680E by M. Hamdi

Memory Technology (2006)

Technology Max single chip density

$/chip

($/MByte)

Access speed

Watts/chip

Networking DRAM

64 MB $30-$50

($0.50-$0.75)

40-80ns 0.5-2W

SRAM 8 MB $50-$60

($5-$8)

3-4ns 2-3W

TCAM 2 MB $200-$250

($100-$125)

4-8ns 15-30W

Page 60: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

60COMP680E by M. Hamdi

How fast a buffer can be made?

BufferMemory

~5ns for SRAM~50ns for DRAM

Rough Estimate:– 5/50ns per memory operation.

– Two memory operations per packet.

– Therefore, maximum ~50/5 Gb/s.

64-byte wide bus 64-byte wide bus

Exte

rnal

Lin

e

Aside: Buffers need to be largefor TCP to work well, so DRAM is usually required.

Page 61: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

61COMP680E by M. Hamdi

DRAM Buffer Memory

Packet Caches

Buffer Manager

SRAM

ArrivingPackets

DepartingPackets12

Q

21234

345

123456

Small ingress SRAM cache of FIFO headscache of FIFO tails

5556

9697

8788

57585960

899091

1

Q

2

Small ingress SRAM

1

57 6810 9

79 81011

1214 1315

5052 515354

8688 878991 90

8284 838586

9294 9395 68 7911 10

1

Q

2

DRAM Buffer Memory

b>>1 packets at a time

Page 62: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

62COMP680E by M. Hamdi

Why are Fast Routers Difficult to Make?

time

Inst

ruct

ion

s p

er

arr

ivin

g b

yte

What we’d like: (more features)QoS, Multicast, Security, …

What will happen

Packet processing gets harderPacket processing gets harder

Page 63: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

63COMP680E by M. Hamdi

Why are Fast Routers Difficult to Make?

0

100

200

300

400

500

600

700

1996 1997 1998 1999 2000 2001

Clock cycles per minimum length packet since 1996

Page 64: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

64COMP680E by M. Hamdi

Options for packet processing

• General purpose processor– MIPS

– PowerPC

– Intel

• Network processor– Intel IXA and IXP processors

– IBM Rainier

– Control plane processors: SiByte (Broadcom), QED (PMC-Sierra).

• FPGA

• ASIC

Page 65: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

65COMP680E by M. Hamdi

General Observations

• Up until about 2000, – Low-end packet switches used general purpose

processors,

– Mid-range packet switches used FPGAs for datapath, general purpose processors for control plane.

– High-end packet switches used ASICs for datapath, general purpose processors for control plane.

• More recently,– 3rd party network processors now used in many low-

and mid-range datapaths.

– Home-grown network processors used in high-end.

Page 66: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

66COMP680E by M. Hamdi

Demand for Router Performance Exceeds Moore’s Law

Growth in capacity of commercial routers (per rack):– Capacity 1992 ~ 2Gb/s– Capacity 1995 ~ 10Gb/s– Capacity 1998 ~ 40Gb/s– Capacity 2001 ~ 160Gb/s– Capacity 2003 ~ 640Gb/s

Average growth rate: 2.2x / 18 months.

Why are Fast Routers Difficult to Make?

Page 67: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

67COMP680E by M. Hamdi

Maximizing the throughput of a routerEngine of the whole router

• Operators increasingly demand throughput guarantees:– To maximize use of expensive long-haul links

– For predictability and planning

– Serve as many customers as possible

– Increase the lifetime of the equipment

– Despite lots of effort and theory, no commercial router today has a throughput guarantee.

Page 68: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

68COMP680E by M. Hamdi

Maximizing the throughput of a routerEngine of the whole router

Interconnect scheduling

Route lookup

TTL proces

sing

Buffering

Buffering

QoS schedu

ling

Control plane

Ingress linecard Egress linecardInterconnect

Framing

Framing

Data path

Control path

Scheduling path

Page 69: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

69COMP680E by M. Hamdi

Maximizing the throughput of a routerEngine of the whole router

• This depends on the architecture of the switching:– Input Queued

– Output Queued

– Shared memory

• It depends on the arbitration/scheduling algorithms within the specific architecture

• This is key to the overall performance of the router.

Page 70: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

70COMP680E by M. Hamdi

Why are Fast Routers Difficult to Make?

Power: It is exceeding the limit

0

1

2

3

4

5

6

1990 1993 1996 1999 2002

Pow

er (

kW)

approx...

Page 71: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

71COMP680E by M. Hamdi

Switching Architectures

Page 72: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

72COMP680E by M. Hamdi

Generic Router ArchitectureGeneric Router Architecture

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

Data Hdr

Data Hdr

Data Hdr

1

2

N

1

2

N

N times line rate

N times line rate

Page 73: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

73COMP680E by M. Hamdi

Generic Router ArchitectureGeneric Router Architecture

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

LookupIP Address

UpdateHeader

Header Processing

AddressTable

AddressTable

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

QueuePacket

BufferMemory

BufferMemory

Data Hdr

Data Hdr

Data Hdr

1

2

N

1

2

N

Data Hdr

Data Hdr

Data Hdr

Scheduler

Page 74: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

74COMP680E by M. Hamdi

InterconnectsInterconnectsTwo basic techniquesTwo basic techniques

Input Queueing Output Queueing

Usually a non-blockingswitch fabric (e.g. crossbar) Usually a fast bus

Page 75: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

75COMP680E by M. Hamdi

Simple model of output queued Simple model of output queued switchswitch

R1Link 1

Link 2

Link 3

Link 4

Link 1, ingress Link 1, egress

Link 2, ingress Link 2, egress

Link 3, ingress Link 3, egress

Link 4, ingress Link 4, egress

Link rate, R

R

R

R

Link rate, R

R

R

R

Page 76: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

76COMP680E by M. Hamdi

Output Queued (OQ) Switch

How an OQ Switch Works

Page 77: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

77COMP680E by M. Hamdi

Characteristics of an output Characteristics of an output queued (OQ) switchqueued (OQ) switch

• Arriving packets are immediately written into the output queue, without intermediate buffering.

• The flow of packets to one output does not affect the flow to another output.

• An OQ switch has the highest throughput, and lowest delay.

• The rate of individual flows, and the delay of packets can be controlled (QoS).

Page 78: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

78COMP680E by M. Hamdi

The shared memory switchThe shared memory switch

Link 1, ingress Link 1, egress

Link 2, ingress Link 2, egress

Link 3, ingress Link 3, egress

Link N, ingress Link N, egress

A single, physical memory device

R

R

R

R

R

R

Page 79: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

79COMP680E by M. Hamdi

Characteristics of a shared Characteristics of a shared memory switchmemory switch

( )

.

( ) / ,Static queues:

Assume memory of size bytes, and is the length of

the queue f or output at time

I f f or all then the switch

operates the same as the basic output queued switc

Dyna

h.

i

i

M Q t

i t

Q t M N i

1( ) ,

I f queues can have any length, so long

as then the l

mic q

oss rate is l

ueues:

ower. N

iiQ t M

Page 80: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

80COMP680E by M. Hamdi

Memory bandwidthMemory bandwidth

Basic OQ switch:• Consider an OQ switch with N different physical

memories, and all links operating at rate R bits/s.

• In the worst case, packets may arrive continuously from all inputs, destined to just one output.

• Maximum memory bandwidth requirement for each memory is (N+1)R bits/s.

Shared Memory Switch:• Maximum memory bandwidth requirement for the

memory is 2NR bits/s.

Page 81: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

81COMP680E by M. Hamdi

How fast can we make a centralized How fast can we make a centralized shared memory switch?shared memory switch?

SharedMemory

200 byte bus

5ns SRAM

1

2

N

5ns per memory operation Two memory operations per packet Therefore, up to 160Gb/s (200 x 8/10 nsec) In practice, closer to 80Gb/s

Page 82: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

82COMP680E by M. Hamdi

Output QueueingOutput QueueingThe “ideal”The “ideal”

1

1

1

1

1

1

1

1

1

11

1

2

2

2

2

2

2

Page 83: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

83COMP680E by M. Hamdi

How to Solve the Memory How to Solve the Memory Bandwidth Problem?Bandwidth Problem?

Use Input Queued Switches• In the worst case, one packet is written and one

packet is read from an input buffer• Maximum memory bandwidth requirement for each

memory is 2R bits/s.• However, using FIFO input queues can result in what

is called “Head-of-Line (HoL)” blocking

Page 84: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

84COMP680E by M. Hamdi

Input QueueingHead of Line Blocking

Del

ay

Load58.6% 100%

Page 85: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

85COMP680E by M. Hamdi

Head of Line BlockingHead of Line Blocking

Page 86: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

86COMP680E by M. Hamdi

Page 87: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

87COMP680E by M. Hamdi

Page 88: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

88COMP680E by M. Hamdi

Virtual Output Queues (VoQ)Virtual Output Queues (VoQ)

• Virtual Output Queues: – At each input port, there are N queues – each

associated with an output port

– Only one packet can go from an input port at a time

– Only one packet can be received by an output port at a time

• It retains the scalability of FIFO input-queued switches

• It eliminates the HoL problem with FIFO input Queues

Page 89: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

89COMP680E by M. Hamdi

Input QueueingInput QueueingVirtual output queuesVirtual output queues

Page 90: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

90COMP680E by M. Hamdi

Input QueuesInput QueuesVirtual Output QueuesVirtual Output Queues

Del

ay

Load100%

Page 91: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

91COMP680E by M. Hamdi

Input Queueing (VoQ)Input Queueing (VoQ)

Scheduler

Memory b/w = 2R

Can be quitecomplex!

Page 92: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

92COMP680E by M. Hamdi

.…

….

Packets (data)

Flow control

1

N

N output queues

In one shared memory

Routing fabric

Combined IQ/SQ ArchitectureCan be a good compromise

Page 93: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

93COMP680E by M. Hamdi

A Comparison Memory speeds for 32x32 switch

Cell size = 64 bytes

Line Rate MemoryBW

Access TimePer cell

MemoryBW

Access Time

Shared-Memory Input-queued

100 Mb/s 6.4 Gb/s 80 ns 200 Mb/s 2.56 s

1 Gb/s 64 Gb/s 8 ns 2 Gb/s 256 ns

2.5 Gb/s 160 Gb/s 3.2 ns 5 Gb/s 102.4 ns

10 Gb/s 640 Gb/s 0.8 ns 20 Gb/s 25.6 ns

Page 94: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

94COMP680E by M. Hamdi

Scalability of Switching Fabrics

Page 95: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

95COMP680E by M. Hamdi

Shared Bus• It is the simplest interconnect possible

• Protocols are very well established

• Multicasting and broadcasting is natural

• They have a scalability problem as we cannot have multiple transmissions concurrently

• Its maximum bandwidth is around 100 Gbps – it limits the maximum number of I/O ports and/or the line rates

• It is typically used for “small” shared memory switches or output-queued switches – very good choice for Ethernet switches

Page 96: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

96COMP680E by M. Hamdi

Crossbars• It is becoming the preferred interconnect of choice for high-

speed switches

• Have a very high throughput, and support QoS and multicast

• N2 crosspoints – but now it is not the real limitation nowadays

configuration

Dat

a In

Data Out

Page 97: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

97COMP680E by M. Hamdi

Limiting factors

Crossbar switchCrossbar switch

– N2 crosspoints per chip,

– It’s not obvious how to build a crossbar from multiple chips,

– Capacity of “I/O”s per chip.

• State of the art: About 200 pins each operating at 3.125Gb/s ~= 600Gb/s per chip.

• About 1/3 to 1/2 of this capacity available in practice because of overhead and speedup.

• Crossbar chips today are limited by the “I/O” capacity.

Page 98: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

98COMP680E by M. Hamdi

Limitations to Building Large Crossbar Switches: I/O pins

• Maximum practical bit rate per pin ~ 3.125 Gbits/sec

At this speed you need between 2-4 pins per single bit To achieve a 10 Gbps/sec (OC-192) line rate, you need

around 4 parallel data lines (4-bit parallel transmission)For example, consider a 4-bit data data parallel 64-input

crossbar that is designed to support OC-192 line rates per port. Each port interface would require 4 x 3 = 12 pins in each

direction. Hence a 64-port crossbar would need 12 x 64 x 2 = 1536 pins just for the I/O data lines

Hence, the real problem is I/O pin limitations

• How to solve the problem?

Page 99: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

99COMP680E by M. Hamdi

Scaling: Trying to build a crossbar from multiple chips

4 inp

uts

4 outputs

Building Block: 16x16 crossbar switch:

Eight inputs and eight outputs required!

Page 100: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

100COMP680E by M. Hamdi

How to build a scalable crossbar

1. Use bit slicing – parallel crossbars•For example, we can use 4-bit crossbars to implement the

previous example. So we need 4 parallel 1-bit crossbars.

•Each port interface would require 1 x 3 = 3 pins in each direction. Hence a 64-port crossbar would need 3 x 64 x 2 = 384 pins for the I/O data lines – which is reasonable (but we need 4 chips here).

Page 101: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

101COMP680E by M. Hamdi

Scaling: Bit-slicing

Linecard

Cell

Cell

Cell

SchedulerScheduler

• Cell is “striped” across multiple identical planes.

• Crossbar switched “bus”.

• Scheduler makes same decision for all slices.

1

2345678

N

Page 102: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

102COMP680E by M. Hamdi

Scaling: Time-slicing

Linecard

SchedulerScheduler

• Cell goes over one plane; takes N cell times.

• Scheduler is unchanged.

• Scheduler makes decision for each slice in turn.

1

2345678

N

Cell

Cell

Cell

Cell

Cell

Cell

Page 103: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

103COMP680E by M. Hamdi

HKUST 10Gb/s 256x256 Crossbar Switch HKUST 10Gb/s 256x256 Crossbar Switch Fabric DesignFabric Design

• Our overall switch fabric is an OC-192 256*256OC-192 256*256 crossbar switch

• Such a system is composed of 8 256*256 crossbar chips, each running at 2Gb/s (to compensate for the overhead and to provide a switch speedup)

256*256Crossbar Switch

256*256Crossbar Switch

D E S8

S E R

Input @ 10Gb/s

8Output @ 10Gb/s

Scheduler 8 bits

• The Deserializer (DES) is to convert the OC-192 10Gb/s data at the fiber link to 8 low speed signals, while the Serializer (SER) is to serialize the low speed signals back to the fiber link

Page 104: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

104COMP680E by M. Hamdi

Architecture of the Crossbar ChipArchitecture of the Crossbar Chip

• Crossbar Switch Core – fulfills the switch functions

• Control – configures the crossbar core

• High speed data link – communicates between this chip and SER/DES

• PLL – provides on-chip precise clock

P L LHigh Speed Data Link

High Speed Data Link

Hig

h S

pe

ed

Da

ta L

ink

Hig

h S

pe

ed

Da

ta L

ink

Controller

1GHz 256*256Crossbar Switch Core

Page 105: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

105COMP680E by M. Hamdi

Technical Specification of our Core-Crossbar Technical Specification of our Core-Crossbar ChipChip

Full crossbar core 256*256 (embedded with 2 bit-slices)

Technology TSMC 0.25m SCN5M Deep (lambda=0.12 m)

Layout size 14 mm * 8 mm

Transistor counts 2000k

Supply voltage 2.5v

Clock Frequency 1GHz

Power 40 W

Page 106: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

106COMP680E by M. Hamdi

Layout of a 256*256 crossbar switch core Layout of a 256*256 crossbar switch core

Page 107: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

107COMP680E by M. Hamdi

HKUST Crossbar Chip in the NewsHKUST Crossbar Chip in the News

Researchers offer alternative to typical crossbar designhttp://www.eetimes.com/story/OEG20020820S0054By Ron Wilson - EE TimesAugust 21, 2002 (10:56 a.m. ET)   PALO ALTO, Calif. — In a technical paper presented at the Hot Chips conference here Monday (Aug.19) researchers Ting Wu, Chi-Ying Tsui and Mounir Hamdi from Hong Kong University of Science and Technology (China) offered an alternative pipeline approach to crossbar design.

Their approach has yielded a 256-by-256 signal switch with a 2-GHz input bandwidth, simulated in a 0.25-micron, 5-metal process.

The growing importance of crossbar switch matrices, now used for on-chip interconnect as well as for switching fabric in routers, has led to increased study of the best ways to build these parts.

Page 108: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

108COMP680E by M. Hamdi

Scaling a crossbarScaling a crossbar

• Conclusion: scaling the capacity is relatively straightforward (although the chip count and power may become a problem).

• In each scheme so far, the number of ports stays the same, but the speed of each port is increased.

• What if we want to increase the number of ports?

• Can we build a crossbar-equivalent from multiple stages of smaller crossbars?

• If so, what properties should it have?

Page 109: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

109COMP680E by M. Hamdi

Multi-Stage Multi-Stage SwitchesSwitches

Page 110: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

110COMP680E by M. Hamdi

Basic Switch Element

2,2X

Two States•Cross•Through

Optional Buffering

0 0

1 1

This is equivalent to crosspoint in the crossbarThis is equivalent to crosspoint in the crossbar

(no longer a good argument)(no longer a good argument)

Page 111: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

111COMP680E by M. Hamdi

Example of Multistage SwitchExample of Multistage Switch

• It needs NlogN Internal switches (crosspoints) – less than the crossbar

K

01

01

01

01

01

01

01

01

01

01

01

01

000001

010011

100101

110111

N

01

234

56

7

one half of

the deck

theother half of

the deck

a perfect shuffle a perfect shuffle

Page 112: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

112COMP680E by M. Hamdi

Packet RoutingPacket Routing

The bits of the destination address provide the required routing tags. The digits in the destination address are used to set the state of the stages.

01

01

01

01

01

01

01

01

01

01

01

01

001010011

100101

110111

0123

4567

000

Perfect shuffle Perfect shuffleStage 1 Stage 2 Stage 3

011

101

011

101

011

101

011

101

0

10

1 1

1

destination port

address

white bitcontrolsswitchsetting

in each stage

Page 113: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

113COMP680E by M. Hamdi

Internal blocking Internal blocking • Internal link blocking as well as output blocking can happen in a

Multistage switch. The following example illustrates an internal blocking for connections of input 0 to output 3 and input 4 to output 2.

01

01

01

01

01

01

01

01

01

01

01

01

001010011

100101

110111

01

23456

7

000

Perfect shuffle Perfect shuffleStage 1 Stage 2 Stage 3

blocking link011

010

011

010

??? ???

???

Page 114: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

114COMP680E by M. Hamdi

Output Blocking Output Blocking

The following example illustrates output blocking for the connections between input 1 and output 6, and input 3 and output 6.

01

01

01

01

01

01

01

01

01

01

01

01

001010011

100101

110111

01

23456

7

000

Perfect shuffle Perfect shuffleStage 1 Stage 2 Stage 3

110

110

110

110

110

110

output blocking

Page 115: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

115COMP680E by M. Hamdi

A Solution: Batcher SorterA Solution: Batcher Sorter• One solution to the contention problem is to sort

the cells into monotonically increasing order based on desired destination port

• Done using a bitonic sorter called a Batcher

• Places the M cells into gap-free increasing sequence on the first M input ports

• Eliminates duplicate destinations

Page 116: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

116COMP680E by M. Hamdi

Batcher-Banyan ExampleBatcher-Banyan Example

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

1

0

4

6

7

3

Page 117: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

117COMP680E by M. Hamdi

Batcher-Banyan ExampleBatcher-Banyan Example

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

0

6

1

7

3

4

Page 118: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

118COMP680E by M. Hamdi

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

0

6

1

7

3

4

Batcher-Banyan ExampleBatcher-Banyan Example

Page 119: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

119COMP680E by M. Hamdi

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

0

3

6

1

7

4

Batcher-Banyan ExampleBatcher-Banyan Example

Page 120: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

120COMP680E by M. Hamdi

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

77

0

3

1

6

4

Batcher-Banyan ExampleBatcher-Banyan Example

Page 121: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

121COMP680E by M. Hamdi

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

6

7

4

3

1

0

Batcher-Banyan ExampleBatcher-Banyan Example

Page 122: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

122COMP680E by M. Hamdi

1

2

3

4

6

7

5

0 0

1

2

3

4

5

6

7

0

1

3

4

6

7

Batcher-Banyan ExampleBatcher-Banyan Example

Page 123: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

123COMP680E by M. Hamdi

Simple Sort & Route NetworkSimple Sort & Route Network

• Simple components with no buffering.– filter eliminates duplicates by comparing consecutive addresses and

returns ack to inputs

– adder computes and inserts “rank” of cells

– concentrator uses rank as output address

– routing network delivers to output

• Adder, concentrator and routing network all have log2n stages

3

6

0

5

3

6

4

3

Sort Filter Add Conc. Route

0

3

3

3

4

5

6

6

0

3

4

5

6

0

3

4

5

6

0

1

3

2

4

0

3

4

5

6

0

3

4

5

6

Page 124: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

124COMP680E by M. Hamdi

3-stage Clos Network3-stage Clos Network

n x k

m x m

k x n1

N

N = n x mk >= n

1

2

m

1

2

k

1

2

m

1

N

n n

Page 125: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

125COMP680E by M. Hamdi

Clos-networkClos-network BlockingBlocking

• Blocking

– When a connection is made it can exclude the possibility of certain other connections being made

• Non-blocking

– A new connection can always be made without disturbing the existing connections

• Rearrangeably non-blocking

– A new connection can be made but it might be necessary to reconfigure some other connections on the switch

Page 126: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

126COMP680E by M. Hamdi

A connection request from input 4 to output 1 is blocked

Same connection request can be satisfied by rearranging the existing connection from input 2 to output 2

12

34

12

34

Connection cannot be set up between input 4 and output 1

Connection can now be set up between input 4 and output 1

12

34

12

34

Page 127: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

127COMP680E by M. Hamdi

Clos-network PropertiesClos-network PropertiesExpansion factorsExpansion factors

• Strictly Nonblocking iff m >= 2n -1

• Rearrangeable Nonblocking iff m >= n

)( thanless complexity

of discovered switch gnonblockinFirst

)( Complexity

2

2/3

nO

nO

Page 128: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

128COMP680E by M. Hamdi

3-stage Fabrics (Basic building block – a crossbar)Clos Network

Page 129: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

129COMP680E by M. Hamdi

3-Stage FabricsClos Network

Expansion factor required = 2-1/N (but still blocking for multicast)

Page 130: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

130COMP680E by M. Hamdi

4-Port Clos NetworkStrictly Non-blocking

3,2X

3,2X

2,2X

2,2X

2,2X

2,3X

2,3X

Page 131: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

131COMP680E by M. Hamdi

Construction example

• Switch size1024x1024

• Construction module– Input switch

thirty-two 32x48

– Central switchforty-eight 48x48

– Output switchthirty-two 48x32

– Expansion 48/32=1.5

48x48#1

32x48#1

48x32#1

48x48#2

32x48#2

48x32#2

48x48#48

32x48#32

48x32#32

1

32

33

64

1024

993

Page 132: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

132COMP680E by M. Hamdi

Lucent ArchitectureLucent Architecture

Buffers

Page 133: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

133COMP680E by M. Hamdi

MSM ArchitectureMSM Architecture

Page 134: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

134COMP680E by M. Hamdi

LC (1)

LC (16)

LC (1137)

LC (1152)

S1/S3(1)

18 x 18

S2 (1)72 x 72

S1/S3(8)

18 x 18

12.5G

LCC(1)

S1/S3(569)

18 x 18

S1/S3(576)

18 x 18

LCC(72)

40G

FCC(1)

FCC(8)

12.5G

S2 (18)72 x 72

S2 (127)72 x 72

S2 (144)72 x 72

Line Card Chassis Fabric Card Chassis

Cisco’s 46Tbps Switch System

• total 80 chassis• 8 sw planes • speedup 2.5• 1152 LICs• 1296x1296 switch fabric• 3-stage Benes sw• multicast in the sw• 1:N fabric redundancy• 40 Gbps packet processor (188 RISCs)

Page 135: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

135COMP680E by M. Hamdi

Massively Parallel SwitchesMassively Parallel Switches

• Instead of using tightly coupled fabrics like a crossbar or a bus, they use massively parallel interconnects such as hypercube, 2D torus, and 3D torus.

• Few companies use this design architecture for their core routers

• These fabrics are generally scalable

• However:

– It is very difficult to guarantee QoS and to include value-added functionalities (e.g., multicast, fair bandwidth allocation)

– They consume a lot of power

– They are relatively costly

Page 136: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

136COMP680E by M. Hamdi

Massively Parallel SwitchesMassively Parallel Switches

Page 137: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

137COMP680E by M. Hamdi

3D Switching Fabric: Avici

• Three components– Topology 3D torus

– Routing source routing with randomization

– Flow control virtual channels and virtual networks

• Maximum configuration: 14 x 8 x 5 = 560• Channel speed is 10 Gbps

Page 138: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

138COMP680E by M. Hamdi

Packaging• Uniformly short wires between

adjacent nodes– Can be built in passive backplanes

– Run at high speed

Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)

Page 139: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

139COMP680E by M. Hamdi

Avici: Velociti™ Switch FabricAvici: Velociti™ Switch Fabric

• Toroidal direct connect fabric (3D Torus)• Scales to 560 active modules• Each element adds switching & forwarding

capacity • Each module connects to

6 other modules

Page 140: COMP680E by M. Hamdi 1 Introduction to High- Performance Internet Switches and Routers

140COMP680E by M. Hamdi

Switch fabric chips comparison