Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Guarantee IP Lookup Performance
with FIB Explosion
Tong Yang(ICT), Gaogang Xie(ICT), Yanbiao Li(HNU), Qiaobin Fu(ICT)
Alex X. Liu(MSU), Qi Li(ICT), Laurent Mathy(ULG)
Performance Issue in IP Lookup
FIB increasing: 15% per year; FIB size: 512,000
512k bug : In 2014.8, Cisco says that web browsing
speeds could slow over the next week as old hardware
is upgraded to handle the 512K FIB.
2
512k now
FIB
Motivation
On-chip vs. Off-chip memory. 10 times faster,
but limited in size.
With FIBs increasing, for almost all packets
3
Constant yet small
footprint for FIB:
On-chip Memory
Constant yet fast
lookup speed:
Low Time Complexity+
Ideal IP Lookup Algorithm
State-of-the-art
Achieving constant IP lookup time
– TCAM-based
– Trie pipeline using FPGA
– full-expansion
– DIR-24-8
Achieving small memory
– Based on Bloom Filter
– Level compression, path compression
– LC-trie
How to satisfy both constant lookup time
and small on-chip memory usage?4
SAIL Framework
5
Observation: almost all packets hit 0~24 prefixes
Two Splitting
– Splitting lookup process
– Splitting prefix length
On-chip
Finding prefix length
Prefix length 0~24
Prefix length 25~32Off-chip
Off-chip
Off-chip
Finding next hop
1
Bitmap arrays
1 0 1 1
0 0 1 1 … 0 1
1 0 1 1 … 1 1
1 0 1 1 … 1 1
6
8 0 3 1
0 0 9 2 … 0 3
1 0 7 1 … 2 1
3 0 4 5 … 1 1
Next hop arrays
Splitting
6
Level 0~24
Short prefixes
Original trie
Bit Maps 0-24
On-Chip
𝑖=0
24
2𝑖 = 4𝑀𝐵
How to avoid searching both short and long prefixes?
Level
25~32
Long
prefixes
Pivot Pushing & Lookup
7
prefixnext-
hop
*/0 6
1*/1 4
01*/2 3
001*/3 3
FIB
111*/3 7
0011*/4 1
1110*/4 8
11100*/5 2
3
3
1
9
4
7
8
2
6 1
0 1
0 1 0 0
0 1 0 0 0 0 0 1
0 0 1 1 0 0 … 01
level 0
level 1
level 2
3 8
A B C
001011*/6 9
D EF
G
H
Trie Bit maps
0 3 0 0 0 0 0 7
B0
B1
B2
B3
B4
N3
(a) (b) (c)
O
0 0 0 1 0 0 … 00N4
3
level 3
Lookup 001010
Pivot level: 4
B4 [001010 >> 2] = 1
N4 [2] = 0
long
prefix
Pivot push:
Update of SAIL_B
8
prefixnext-
hop
*/0 6
1*/1 4
01*/2 3
001*/3 3
FIB
111*/3 7
0011*/4 1
1110*/4 8
11100*/5 2
3
3
1
9
4
7
8
2
6 1
0 1
0 1 0 0
0 1 0 0 0 0 0 1
0 0 1 1 0 0 … 01
level 0
level 1
level 2
3 8
A B C
001011*/6 9
D EF
G
H
Trie Bit maps
0 3 0 0 0 0 0 7
B0
B1
B2
B3
B4
N3
(a) (b) (c)
O
0 0 0 1 0 0 … 00N4
3
level 3
Insert 10*
1
B2[10]=1
delete111*
B3[111]=0
0
changing 001*, or inserting 0010*
only need to update off-chip tables
Optimization
SAIL_B
– Lookup: 25 on-chip memory accesses in worst case
– Update: 1 on-chip memory access
Lookup Oriented Optimization (SAIL_L)
– Lookup: 2 on-chip memory accesses in worst case
– Update: unbounded, low average update complexity
Update Oriented Optimization (SAIL_U)
– Lookup: 4 on-chip memory accesses in worst case
– Update: 1 on-chip memory access
Extension: SAIL for Multiple FIBs (SAIL_M)
9
10
Level 24
Level 32
Level 16
SAIL_L
If B16==1
If B24==1
N16
N24
Y
Y
N32
N
N
SAIL_U
11
Level 6
Level 12
Level 18
Level 24
• Pushing to levels 6, 12, 18,
and 24.
• One update at most affects
2^6= 64 bits in the bitmap
array.
Still at most one on-chip memory access
is enough for each update.
SAIL_M
12
A: 00*
C: 10*
G: 110*
A:00*
C:10*
E:100*
Trie 1 Trie 2 Overlay Trie A: 00*
B: 01*
E: 100*
F: 101*
G: 110*
H: 111*
A AC D C
G
C
E F G
B
H
E
A
+D
(a) (b) (c)
SAILs in worst case
13
On-Chip
Memory
Lookup
(on-chip)
Update
(on-chip)
SAIL_B = 4MB 25 1
SAIL_L ≤ 2.13MB 2 Unbounded
SAIL_U ≤ 2.03MB 4 1
SAIL_M ≤ 2.13MB 2 Unbounded
Worst case: 2 off-chip memory accesses for lookup
Implementations FPGA: Xilinx ISE 13.2 IDE; Xilinx Virtex 7 device; On-
chip memory is 8.26MB
– SAIL_B, SAIL_U, and SAIL_L
Intel CPU: Core(TM) i7-3520M 2.9 GHz; 64KB L1,
512KB L2, 4MB L3; DRAM 8GB
– SAIL_L and SAIL_M
GPU: NVIDIA GPU (Tesla C2075, 1147 MHz, 5376 MB
device memory, 448 CUDA cores), Intel CPU (Xeon E5-
2630, 2.30 GHz, 6 Cores).
– SAIL_L
Many-core: TLR4-03680, 36 cores, each 256K L2 cache.
– SAIL_L14
Evaluation
FIBs
– Real FIB from a tier-1 router in China
– 18 real FIBs from www.ripe.net
Traces
– Real packet traces from the same tier-1 router
– Generating random packet traces
– Generating packer traces according to FIBs
Comparing with
– PBF [sigcomm 03]
– LC-trie [applied in Linux Kernel]
– Tree Bitmap
– Lulea [sigcomm 97 best paper]
15
FPGA Simulation
16
rrc00rrc01rrc03rrc04rrc05rrc06rrc07rrc10rrc11rrc12rrc13rrc14rrc150.0B
200.0kB
400.0kB
600.0kB
800.0kB
1.0MB
1.2MB
O
n-c
hip
me
mo
ry u
sa
ge
FIB
SAIL_L PBF
SAIL Algorithms Lookup Speed Throughput
SAIL_B 351Mpps 112Gbps
SAIL_U 405Mpps 130Gbps
SAIL_L 479Mpps 153Gbps
Intel CPU: real FIB and traces
17
1 2 3 4 5 6 7 8 9 10 11 120
100
200
300
400
500
600
700
800
Lo
oku
p s
pee
d (
Mpp
s)
FIB
LC-trie TreeBitmap Lulea SAIL_L
Intel CPU: 12 FIBs using prefix-based and random traces
18
2 3 4 5 6 7 8 9 10 11 120
100
200
300
400
500
Lo
oku
p s
pe
ed
(M
pp
s)
# of FIBs
Prefix-based traffic
Random Trace
Intel CPU: Update
19
919
29
39
49
59
69
79
89
99
109
119
129
139
149
159
169
179
189
199
209
219
229
239
249
259
269
279
289
299
309
319
0
2
4
6
8
10
12
14
# o
f m
em
ory
accesses p
er
update
# of updates (*500)
rrc00
average of rrc00
rrc01
average of rrc01
rrc03
average of rrc03
GPU: Lookup speed VS. batch size
20
rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc150
50
100
150
200
250
300
350
400
450
500
550
600
650
Lookup s
peed
(Mpps)
FIB
30
60
90
GPU: Lookup latency VS. batch size
21
rrc00 rrc01 rrc03 rrc04 rrc05 rrc06 rrc07 rrc10 rrc11 rrc12 rrc13 rrc14 rrc150
20
40
60
80
100
120
140
160
180
200
220
240
Late
ncy
(mic
rosecond)
FIB
30
60
90
Tilera GX-36: Lookup VS. # of cores
22
2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 340
100M
200M
300M
400M
500M
600M
700M
Lo
oku
p s
pe
ed
(p
ps)
# of cores
Conclusion
Two-dimensional Splitting Framework: SAIL
Three optimization algorithms
– SAIL_U, SAIL_L, SAIL_M
– Up to 2.13MB on-chip memory usage
– 2 off-chip memory accesses
Suitable for different platforms
– FPGA, CPU, GPU, Many-core
– Up to 673.22~708.71 Mpps
Future work: SAIL to IPv6 lookup
23
24
Source codes of SAIL, LC-trie, Tree Bitmap, and Lulea
http://fi.ict.ac.cn/firg.php?n=PublicationsAmpTalks.OpenSource
Thankshttp://fi.ict.ac.cn