1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date:

1

High-performance TCAM-based IP Lookup Engines

Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng

Publisher: IEEE INFOCOM 2008

Present: 林呈俞

Date: 2008/9/24

2

Outline Introduction

• Previous works

• MSMB scheme

• MSMB-PT scheme MSMB-LPT scheme Goals of this paper Proposed works

• M-MSMB-LPT scheme

• MSMB-LPT-I scheme Experimental results

3

Introduction (1/3) To achieve high IP lookup performance, it has been proposed to use TCAMs to

implement IP-Lookup accelerators.

One TCAM-based routing table is shared by multiple packet streams in one line card or multiple line cards in practice.

Previous works on reconfiguring a TCAM into several independent blocks.

• MSMB

• MSMB – PT

• MSMB – LPT

4

Introduction (2/3) MSMB (Multi – Selector and Multi – Block) scheme

• Proposed in [6] to reconfiguring a TCAM into several independent blocks so that parallel IP lookup is possible.

• With K TCAMs, instead of performing only one lookup in each cycle, all TCAMs can concurrently be used for different lookups.

• One would need M parallel RDs for the this system.

5

Introduction (3/3) MSMB – PT (Popular – prefix table) scheme

• This scheme is based on temporal locality of packet destinations.

• In order to alleviate the TCAM contention problem caused by traffic bias.

Popular-Prefix Table (PT) : caching some of the prefixes recently used by all inputs.

6

MSMB – LPT (Local PT) (1/2) A flow is a stream of packets, for which the packets are transmitted as a bursty

sequence. For a given router R, the packets of flows arrive at same input of R exhibit bias

of IP streams to a small set of IP prefixes. For any bursty traffic period of an input of R, the bias of IP addresses is called t

he temporal locality of flows. The major difference between MSMB – LPT and MSMB – PT are as follows

MSMB – LPT improve the performance of MSMB – PT by up to 250%(speedup), 80%(hit ratio), 82%(TCAM contention), and 71%(TCAM power consumption).

LPT helps to reduce the number of accesses to the TCAM blocks and TCAM contentions.

MSMB-PT MSMB-LPT

Capture temporal locality global to all input.

Capture temporal locality of flow

7

MSMB – LPT (Local PT) (2/2)

Local Popular-Prefix Table (LPT) : it used to dynamically store recently referenced IP prefixes requested from input i.

Contention Resolver (CR) : chooses one request according to a priority scheme and passes it to TCAM.

8

Goals of this paper How to design a TCAM-based IP lookup engine that

• improves MSMB-LPT without using more HW resources ?

• satisfy given performance requirements ?

For lage m (inputs)

• How to design a scalable TCAM-based IP lookup engine ?

• How to find tradeoffs among cost, performance and reliability ?

9

Proposed work (1/5) Definitions:

• MSMB – LPT has a configuration with (m, n, k)

• m input

• k TCAM blocks

• LPT of size n

• Total number of prefixes M (each block contains M/k prefixes).

The parameters m and k are carefully selected to achieve optimized cost and performance.

Are there better MSMB schemes for given m and k ?

Two proposed schemes:

• M – MSMB – LPT

• MSMB – LPT – I

10

Proposed work (2/5) Multiple(M) – MSMB – LPT

• For large m (input), we propose to use w identical copies of MSMB – LPT of configuration (m’, n, k).

• input i*m’ + j as the j-th input of the (i+1)-th MSMB-LPT.

m’ = m / w

11

Proposed work (3/5) Multiple(M) – MSMB – LPT

The w TCAM clocks TCAMj,u ,have the same content as TCAMu in MSMB-LPT, where j = 1 ~ w.

We say that an M-MSMB-LPT has configuration (m, n, w, k).

• if it has w MSMB-LPTs of configuration (m’, n, k).

In an M-MSMB-LPT scheme, w MSMB-LPTs operate completely independently.

MSMB - LPTj

Input (j-1)*m’ + 1Input (j-1)*m’ + 2

Input j*m’

k CRs and k TCAM blocks

…

12

Proposed work (4/5) MSMB – LPT – Interleaved TCAMs (MSMB – LPT – I)

• An MSMB – LPT – I of configuration (m, n, w, k) has

• m input, and the LPT of size n.

• wk TCAM blocks that are partitioned into k groups, each called TCAM bundle.

Input 1

Input 2

Input m

The w TCAM blocks in the j-th TCAM bundle contain the same content as that of TCAMj in the MSMB-LPT scheme.

k bundles

13

Proposed work (5/5)Process runs concurrently

i = 1~ m j = 1~ k

ni – th key from input i

The concurrent TCAM – search processes are coordinated by CR, which can be implemented as a round robin m – to – w selector.

14

Experimental results (1/9) We conduct a serious simulations on M-MSMB-LPT and MSMB-LPT-I.

• First – in – first – out (FIFO) replacement policy is used for LPT update.• Round – rodin (RR) arbitration is used for TCAM contention resolution.

Two packet traces are used in simulations.• 1. generating accroding to routing table described in [17].• 2. derived from actual packet flows given in [19].

The performance of an M-MSMB-LPT is determined by a single component MSMB-LPT.

The performance of MSMB-LPT and M-MSMB-LPT can be derived from the performance of MSMB-LPT-I with configurations (m, n, w, k) as follows.• (m, n, 1, k) = MSMB-LPT with (m, n, k).• (m, n, 1, k) = M-MSMB-LPT with (w*m, n, w, k).• Example:

• MSMB-LPT-I with (6, n, 1, 4) can be used to indicate the performance of M-MSMB-LPT with (12, n, 2, 4) as well as (18, n, 3, 4)

# bundles# blocks

15

Experimental results (2/9) Performance metrics

• TCAM contention ratio

• Speedup over naïve MSMB

• TCAM utilization

# contentions at TCAM blocks

Total # key search time.

Total # parallel cycles to complete IP lookup for all packets in a trace.

AMSMB-LPT-I(j) : total # cycles in which TCAMj blocks is searched.

16

Experimental results (3/9)• Power consumption

17

Experimental results (4/9) Speedup

48 TCAM blocks

16 TCAM blocks

18

Experimental results (5/9) Power consumption

19

Experimental results (6/9) Contention ratio

• 36 inputs and 4 TCAM blocks in each bundle.

• Increase the number of TCAM bundles.

• From 1 to 2

• From 4 to 6

1

2

34

(36, n, w, 4) w = 1, 2, 4, 6

20

Experimental results (7/9) Given the available TCAM resource such as

• # TCAM bundles – 2

• # TCAM blocks in each bundle – 4 It is important to know the expected contention ratio under different inputs.

(m, n, 2, 4) m = 6, 12, 18, 36

6

12

18

36

21

Experimental results (8/9) Speedup gain of increasing the TCAM bundle for a given # inputs.

(36, n, w, 4) w = 1, 2, 4, 6

1

2

46

22

Experimental results (9/9) The speedup changes with the number of inputs.

(m, n, 2, 4) m = 6, 12, 18, 36

Documents

1 High-performance TCAM- based IP Lookup Engines Authors: Hui Yu, Jing Chenm Jianpian Wang and S.Q. Zheng Publisher: IEEE INFOCOM 2008 Present: 林呈俞 Date: