42
1 ENTS689L: Packet Processing and Switching Anatomy of an IP Router Anatomy of an IP Router Vahid Tabatabaee Fall 2007

Anatomy of an IP Router Vahid Tabatabaee Fall 2007

  • Upload
    george

  • View
    43

  • Download
    1

Embed Size (px)

DESCRIPTION

Anatomy of an IP Router Vahid Tabatabaee Fall 2007. References. Title: Network Processors Architectures, Protocols, and Platforms Author: Panos C. Lekkas Publisher: McGraw-Hill James Aweya, “IP Router Architectures: An Overview”, Nortel Networks, Ottawa, Canada - PowerPoint PPT Presentation

Citation preview

Page 1: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

1ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Anatomy of an IP Router

Vahid Tabatabaee

Fall 2007

Page 2: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

2ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

References

Title: Network Processors Architectures, Protocols, and PlatformsAuthor: Panos C. LekkasPublisher: McGraw-Hill

James Aweya, “IP Router Architectures: An Overview”, Nortel Networks, Ottawa, Canada

Florian Brodersen, Alexander Klinetschek, “Anatomy of a High Performance IP router”, Communication Network Seminar 2003/04, Hasso-Plattner-Institute, University of Potsdam, Jan. 2004

Steve Kohalmi, Tim Hale, “Anatomy of an IP Service Edge Switch”, 2002 Quary Technologies.

Cisco Systems CRS-1 router documents.

Page 3: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

3ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Basic IP Router Components

Network Interfaces

Processing Modules

Buffering Modules

Interconnection Unit (switch fabric)

The processing and buffering modules may be replicated either fully or partially on the network interfaces.

Path computation, Routing Table Maintenance

Packet Forwarding, Packet Processing,May cache routing table

Transfer Packets btw. Ingress and Egress Interface (Line) Cards

Page 4: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

4ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Basic Functions of a Router

Route Processing (Routing Protocols OSPF, RIP, …)

Path ComputationRouting Table

MaintenanceReachability Propagation

Packet Forwarding

Slow Path orControl Plane

Fast Path orData Plane

Page 5: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

5ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Packet Forwarding

IP Packet Validation Version Number Header length field Check sum.

Dest. IP address parsing and table lookup. Local delivery in the network. Unicast delivery to an output port. Multicast delivery to a set of output ports

Packet Lifetime Control Adjust the time-to-live (TTL) field A packet with positive TTL is delivered to a local address Packet delivered to output ports has its TTL decremented and rechecked before

forwarding Packet Fragmentation

Check if the packet size is larger than MTU of the network If yes, fragment the packet.

Page 6: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

6ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

First Generation of Routers

Similar to a typical computer layout.

All functionality is implemented in software.

Single CPU, single Memory, Single Bus!

Page 7: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

7ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Problems with first generation routers

Processing speed is limited by the single CPU.

The CPU should process all packets destined to it and those packets that are passing through it.

Major packet processing tasks such as table lookups are memory intensive operations and can not be done faster by simple processor upgrades.

Software implementation is inefficient, since it is a small set of operations repeated on all packets.

Slow path and fast path are implemented on the same CPU. Therefore, slow path can influence the fast path.

The routing table size has grown from 20,000 entries from 1994 to 260,000 entries today.

Moving data from one interface to another can be time consuming that often exceeds the packet processing time. Source http://bgp.potaroo.net/

Page 8: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

8ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

The routing table lookup speed can not be improved if we use traditional memories.

The conventional bus structure for the interconnection is very inefficient.

Every packet has to pass the bus at least twice.

The whole packet (not just the header) is transferred.

Problems with first generation routers

Page 9: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

9ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

How fast a router should be?

An OC-48 link data rate : 2.488 Gbps

Packet rate is more important than the data rate.

Bottleneck is caused by the minimum packet size which depends on the technology.

E.g. Packet-over-SONET (PoS): 40 byte IP payload + 6 byte PPP/HDLC overhead:

2.405 Gbps /(8 x 46) = 6.53 MPPS The aggregate packet rate for a 16 port system:

16 x 6.53 = 104.48 MPPS One decision every 9.57 nsec.

SDRAM speed is about 10ns from sequential locations and practically around 20-50 ns.

Page 10: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

10ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

What is the solution

Take advantage of Parallelism: NIC became more intelligent and took care of most packet

forwarding. We use ASIC in NIC (line cards) for packet classification and

forwarding. Most packets do not go to the CPU card (control card).

Switching Interface: Use switching element to pass packets between line cards

directly and simultaneously.

Page 11: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

11ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Modern Switch Based Architecture

Classification and forwarding decisions are done in line cards.

High speed interconnection mechanism (switching) between the line cards.

This provides a fast data path.

Standard CPU (RISC processor) is used for the control plane (slow path).

Hardware and/or software implementation for classification and forwarding in the line card.

Page 12: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

12ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Functional Blocks in a Modern Switches

The PHY Interface Responsible for transmitting and receiving information Conversion of the bit stream from digital form to analog signal and vice

versa. Switch Fabric

The router has a bus or a backplane The switch fabric reads packet from input port and routes it to the

output port. Packet processing

Fast path (data path): Handles all operations that are executed in real time on packets (e.g.: framing/parsing, classification, modification, compression/encryption, queueing)

Slow path (control path): Operations executed of the packet flows. (e.g.: add. Resolution, route calculation, update of routing table,…)

Host processing Network management, configuring devices, diagnostics Implemented in software on a CPU

Page 13: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

13ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Line cards in Modern Switches

Line card handles packet processing such as:

Classification

Forwarding

Traffic Policing and shaping

Monitoring and Statistics

Page 14: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

14ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Data Path Diagram

OpticsCDR &Serdes

Framer/

Mapper

NetworkProcessor

TrafficManager

SwitchInterface

SwitchingElement

SchedulingElement

Egress

Line CardSwitch Card

Ingress

Packet Processing Units

Source: Light Reading Report

Page 15: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

15ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Data Path Functions

- Parse- Identify flow- Determine Egress Port- Mark QoS Parameters- Append TM or SF Header

- Police- Manage congestion (WRED)- Queue packets in class-based VOQs- Segment packets into switch cells

- Queues cells in class based VOQs- Flow control TM per class based VOQ - Schedule class based VOQs to egress ports

-Reassemble cells into packets-Shape outgoing traffic-Schedule egress traffic

Network ProcessorNetwork Processor Ingress Traffic ManagerIngress Traffic Manager Switch FabricSwitch Fabric Egress Traffic ManagerEgress Traffic Manager

WRED

Discard

Segmentation + header

TM Scheduler

SF Flow Control

SF Arbiter

Rea

ssem

ble

Egress Scheduler&

ShaperIncoming packets

Class based queueing of outgoing packets

Ingress Line CardIngress Line Card Switch FabricSwitch Fabric Egress Line CardEgress Line Card

Page 16: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

16ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Switch Card to Line Card Connection

This connection should pass through the Backplane. Serdes (Serializer-Deserializer) is used for this

connection.Each Serdes signal run over two wires and two pins

(differential mode signal).The speed is usually around 3.125 Gbps.They run some sort of coding (8b/10b encoding)The actual data rate would be around 2.5 Gbps.There are attempts to provide 10 Gbps serdes.

Page 17: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

17ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

How many serdes do we need?

How fast should be the connection between switch card and line card?

The line speed is not enough.

Switch fabric throughput is less than 100% due to contention.

Network Processor, Traffic manager and switch fabric add their headers.

There is also cell tax.

Page 18: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

18ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Speedup

SwitchInterface

SwitchingElement

SchedulingElement

Line CardSwitch Card

LineCard

Elements

RL RTM RSFF

rag

me

nta

tio

n(C

ell

Ta

x)

Tra

ffic

Ma

na

ge

rH

ea

de

r

Sw

itc

h F

ab

ric

He

ad

er

Effective Speedup = RSF/RTM

In the commercial systems, speedup usually refers to RSF/RL.

Higher speedup factor: Increases system design

complexity. Increases power

consumption. Creates signal integrity

issues. Required Speedup factor is

around 2

Page 19: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

19ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Redundancy

We have spare switch cards and control cards in the system.

The redundancy models: Passive redundancy (N:1) We have

one inactive switch card in the system that starts to work after failure.

Passive redundancy (1:1, N:N) for each active switch card, we have one inactive card.

Load-Sharing Redundancy (N-1) all cards are active and when a failure happens, performance will degrade gracefully.

Active Redundancy (1+1): Two sets of fabrics carrying the same traffic.

Source: www.idt.com/content/switchblock.jpg

Page 20: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

20ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Example

In a 16 port 10Gbps switch with 2X speed up with and N:N redundancy how many 2.5 Gbps serdes do we need?

We need 20 Gbps active and 20 Gbps redundant data rate for each line card.

This means 16 serdes for each line card.

For 16 line cards we need 256 serdes in this system.

Page 21: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

21ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Example

What is the effective speed up of this system for 40 byte IP packets if the traffic manager header size is 12 bytes, switch fabric header size is 8 bytes and the payload size of the cell is 52 bytes.

Solution: In slide 9 example we show that there can be 6.53 MPPS (40 byte packets) on an OC-48 line.

Similarly on an OC-192 there can be up to 9.622/(8x46) = 26.15 MPPS. Each packet is encapsulated in one cell, since

40 + 6 < 52 The maximum number of cells that a line card can generate is

(2.5 x 8 Gbps) / ((52+8+12)x8) = 34.722 Effective Speedup is,

Speedup = (34.722/26.15) = 1.33

Page 22: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

22ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Traces Per Serdes

Typical LVDS speed is 1.25Gbps For 2.5Gbps we need 2 channels LVDS is differential, i.e. 2 traces per channel LVDS is unidirectional, i.e. 2 for full duplex Full duplex 2.5Gbps, using LVDS requires 8

traces In the previous example we will have 256 x 8 =

2048 traces on the back plane.

Page 23: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

23ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

A sample Router (Cisco CRS-1)

Page 24: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

24ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

The line card chassis

8 service cards and 8 physical layer interface module cards

Page 25: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

25ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Page 26: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

26ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Physical layer Interface Module

Routing Processor (control plane)

Switching Card

16 slot Single-Shelf system

The distributed route processor (DRP) is optional components that provide enhanced routing capabilities.• The DRP contains two symmetric multiprocessors (SMPs), each of which performs routing functions. • Processor-intensive tasks (such as BGP speakers and ISIS) can be offloaded from the route processors (RPs) to the DRPs.

Page 27: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

27ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Multishelf Systems

2 to 72 line card shelves1 to 8 fabric card shelves

Page 28: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

28ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

How to handle packet processing?

Of the shelf CPUThis usually would be a RISC processor.In low end systems it could be a CISC processor.

ASICSpecialized high performance ASIC to handle packet

processing.

Ideal approach for companies such as IBM and intel, since they are manufacturers of Integrated Circuits

Page 29: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

29ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Off-the-shelf CPU Systems

Packet processing is implemented in software running on the CPU.

Modifications, upgrades and debugging is accomplished by simple software updates and downloads

Update time much shorter which is good for both user and developer

Not very efficient: spending many clock cycles on tasks not related to packet processing.

Fastest off-the-shelf CPU can handle about 1 gigabit per second.

Trend is to do deeper packet processing (more on this later).

Page 30: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

30ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Memory Bottleneck

The pipeline architecture of CPU enables them to perform billions of instructions per second.

However, in order to sustain the pipeline they should fetch data from memory and store it back continuously.

This can be done with very sophisticated multi-level hierarchy of different memory technology, interleaving memory banks.

This requires prohibitive cost, design complexity and power consumption.

Hence typical processor pipeline end-up being often empty, which reduces the system throughput.

Network traffic statistics models are completely different from local traffic on a computer bus. They do not have the same spatial and temporal locality properties. Hence, the typical processor’s cache systems are not effective.

Page 31: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

31ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Sup-optimal Instruction Set

The instruction set that we need for packet processing requires specific bit level operations.

These instructions should be done at wire speed.

These instructions are not available as standard instructions of off-the-shelf CPU.

Hence, we have to assemble multiple standard instructions to perform the intended functionality.

Page 32: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

32ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Packet processing with ASIC

ASIC typically delivers higher performance.

ASIC is not programmable: Adding new functionality new design Adding new protocol new design New design Costly for both vendor and the user.

ASIC design is very time consuming Design cycle takes 12 to 18 months. If we need some modification we may need to recode the

whole design. Many start-up failures are due to time delay.

Page 33: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

33ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

ASIC development is costly

Expensive and time-consuming to change. For testing an ASIC you need to design a

system Expensive development tools (design and

verification). Requires ASIC designers (much more

expensive). Tape out of a chip costs around a million dollar.

Page 34: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

34ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

So is there a middle ground solution?

Can we have a technology that :

Has flexibility of programmable processors

Has high speed of ASICs

Solution is called Network Processor!

Network processor are programmable similar to CPU, but their performance is close to ASIC

Page 35: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

35ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Network Processors value proposition

Shorter time to market: Instead of 18 months it takes about 6 months to complete

development cycle of packet processing part. Longer time in market:

New features can be embedded into a deployed network processor based product.

Increased time in market reduces cost of product ownership over the life of product.

Just-in-time delivery of new features: We can modify the design and adding new features in the

field without penalizing the customer. Greater focus on other issues of business management

Most functions are already coded in a standard way Developers can focus on differentiating features

Page 36: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

36ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Packet Processing Stages

1. Remove Link Layer Headers and Decryption: Ethernet PPP (Point-to-Point Protocol) Frame PPP over ATM PPP over Ethernet over ATM

2. Identify Ingress Subscriber: To extract information from the link layer protocol header about the owner of the

packet.

3. Filtering: To permit of deny specific traffic flows, based on various attributes of the IP and

higher layers headers.

4. Traffic Classification: To allow different traffic management, QoS, security and routing policies applied to

different types of flows.

5. Traffic Metering, Marking & Policing: To control Peak and Committed Information Rate. To determine PHB in the DiffServ Model (chnaging the priority)

Page 37: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

37ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Packet Processing Stages

6. Custom Routing Polices: To direct some traffic through specific paths (internet, VPN, specific destination) Virtual Private Routed Network allows users to network in privacy over their own

routed network using their own private address. Sending suspicious traffic to explicit locations for special processing.

7. NAT & NAPT (Network Address [Port] Translation): Address translation at the source if the user is using a private address space. Static one-to-one with NAT and dynamic many-to-one with NAPT.

8. Route Table Look-up: Best matching prefix look-up on the destination IP address.

9. Enforcing the PHB/ PerFlow (Link Sharing): Priority, WRR, WFQ scheduling, WRED (weighted random early detection).

10. Egress Side Processing (QoS, filtering, encryption, NAT, Egress Subscriber Identification, Traffic Classification, Link Sharing)

11. Statistical Collection

Page 38: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

38ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Deep Packet Processing

In deep packet processing we need to look at the contents of the packet not just the header.

Why do we need deep packet processing? Deep packet inspection for firewalls and intrusion detection

systems. Traffic shape or discard P2P traffic

Server load balancing: distribution of traffic among servers

based on the web destination

Network Monitoring and Analysis

Page 39: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

39ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Packet Processing Implementation issues

We need multiple table look-ups for each packet. Access to whole packet not just the IP header is necessary. There can be ten of thousands of simultaneously active

subscribers comprising millions of application flows. In a fully loaded Gigabit Ethernet connection about 1.5 million

packets per second must be processed Modern general processors are optimized for numeric computation

rather than processing packets. Memory read and write speeds become bottlenecks. Caching and high-speed memory burst capability does not help,

since packet processing requires: Large tables Short entries Random access queries

Page 40: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

40ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

How do network processors do this?

Specialized circuitry and micro-engines to perform all generic packet processing functions.

They also usually embed a major programmable module, usually a tailor-made RISC CPU (and sometimes more than one). Real time operating system Handshake communication with other parts of the system

Page 41: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

41ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

Network Processor Categories

Platform Network Processors objectives: Handle most packet processing functions Minimize the number of components and the hardware cost Optimize the trade-off btw. Performance and flexibility Accelerate software development cycle

Peripheral Network Processors Designed to optimize a specific function Compressor chips IP security

Page 42: Anatomy of an IP Router Vahid Tabatabaee Fall 2007

42ENTS689L: Packet Processing and SwitchingAnatomy of an IP Router

The other side argument

Every single task can be done in wire-speed.

How about multi-tasks at the same time.

What is a realistic scenario to consider?

Challenge of Benchmarking

Programming complexity