Upload
duongkhanh
View
215
Download
0
Embed Size (px)
Citation preview
Linux Kongress 2008
Hamburg/Germany
Robert Olsson Uppsala University Robert Olsson Uppsala University
2008-10-10
Over 10 years in production
● Three major installations
● UU core routers towards SUNET● UU Student Network 30.000 students● ftp.sunet.se
Over 10 years in production
UU facts
Over 25.000 registered hostsDual ISP BGP connect GIGELocal DMZ BGP peering GIGEIpv4/Ipv6OSPFv2/OSPFv3 600 netfilter rules10 Cisco 6500 OSPF-routersRedundant Power
10g planned
Over 10 years in productionThe SUNET FTP ARCHIVE
ftpftpftpftp
DMZ
AS15980
AS1653
Juniper Juniper
BifrostLINUX
BifrostLINUX
Router discoveryIRDP
Full Internet routingIPv4, IPv6
10TB
Over 10 years in production
Student Network facts
Dual ISP BGP connect GIGELocal DMZ BGP peering GIGEIpv4IRDP (ICMP)About 30 netfilter rules19 netlogin-service boxes for premises
Very “innovative” usersWell connected
10g planned
Testing, Verification Development & Research
● Started out as simple testing.● Curiosity, Open Source, Collaboration
● Relatively freedom, the idea to use in own infrastructure. No need for external funding.
● OS was intended for desktops.
Building BlocksHardware:
PCMotherbord/CPU/MemoryNetwork Interfaces
GIGE/10g WiFi etcSoftware
Operating SystemLinux/BSD/MicrosoftApplications
Routing DaemonsQuagga/XORP
IP-login/netlogonNetwork
Cable, Fiber, CopperEquipment, Switches
Testing, Verification Development & Research
No need for test network. W e could test in own infrastructure. (Or SLU)
W e could work on complicated issues● NAPI 3 years● Pktgen 2 years● fib_trie 1year● TRASH 1 year● Hardware Testing M any years
Testeddevice
Flexible netlab at Uppsala University
* Raw packet performance* TCP* Timing* Variants
sinkdevicelinux
El cheapo-- High customable -- We write code :-)
Ethernet
||
Test generatorlinux
Ethernet
Latest & Greatest Hardware
Intel 10g board Chipset 82598
Open chip specs. Thanks Intel!But why fixed XFP's?? Better classifier needed.
Quad vs Dual Core Opteron
2U Hi-End Opteron box TYAN S2927/Barcelona
DualCore 2222 3.0 G Hz Q uadCore 2365 2.3 G Hz
0
100
200
300
400
500
600
700
800
900
Surprising!
One CPU core on 2.3 GHzis faster then is the 3.0 GHzDual-Core.
L3 cache, Microcode?
Bifrost concept
● Linux kernel collaboration
● Performance testing, development of tools and testing techniques
● Hardware validation, support from big vendors
● Detect and cure problems in lab not in the network infrastructure.
● Test deploy (Often in own network)
Kernel footprints
HW_FLOWCONTROLTulip
FASTROUTE path
Whitehole device. In the middle of dev.cHardwired IP addresses. (Russian?)
Overall Effect● Inelegant handling of heavy net loads
– System collapse● Scalability affected
– System and number of NICS● A single hogger netdev can bring the system to
its knees and deny service to others
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
Sum m ary 2.4 vs feedback
March 15 report on lkmlThread: "How to optimize routing perfomance"reported by [email protected] Linux 2.4 peaks at 27Kpps- Pentium Pro 200, 64MB RAM
A high level view of new system
P
pkts Interupt
areaPolling area
➔P packets to deliver to the stack (on the RX ring)➔Horizontal line shows different netdevs with different input rates➔Area under curve shows how many packets before next interrupt➔Quota enforces fair share
Quota
NAPI observations & issue: fairness
Idle DoS0
100
200
300
400
500
600
122 12399
380
95
408
95
541
93
254
101
323
105
540
96
202
96
190
Ping latency/fairness under xtrem e load/SM P
Ping latency in microseconds
Ping through a idle router Ping through a routerunder a DoS attack @ 890 kpps
VaeVery well behaved just an increase a couple of 100 microsec !!
NAPI Kernel support
NAPI kernel part was included in:2.5.7 and back ported to 2.4.20
Current driver support:
e1000 Intel GIGE NIC's – (UFO driver)First driver where (RX & TX done in softirq)
tg3 BroadCom GIGE NIC'sdl2k D-Link GIGE NIC'stulip (pending) 100 Mbs
Forwarding performance (old)
64 128 256 512 1024 15180
100
200
300
400
500
600
700
800
900
Linux forwarding rate at different pkt sizes
Linux 2.5.58 UP/skb recycling 1.8 G Hz XEO N
InputThroughput
packet size
kpps
Fills a GIGE pipe -- starting from256byte pkts
ipv6 performance(old)
T-put0
100
200
300
400
500
600
700
Forwarding kpps 76 byte pkt.
Linux 2.5.12 1 CPU(SMP) Opteron 1.6 GHz e1000
Single flow small Singe flow 543 rrDoS 543 r
How rDoS work on sparse routing table?
fib_trie performance comparison
fib_hash fib_trie0
100
200
300
400
500
600
700
forwarding kpps
Linux 2.6.16 1 CPU used(SMP) Opteron 1.6 GHz e1000
dsh hash5 r single flow5 r rDoS123kr rDoS
Preroute patches to disable route hash
32/64 bit || sizeof(sk_buff)
32 64
0
50
100
150
200
250
300sizeof(struct sk_buff)
size
64 bit 32 bit
0
0.1
0.2
0.3
0.4
0.5
0.6
relative forwarding
T-put
Gcc 3.4 x86_64 vs i686 on same HW
Trash data-structureInteresting novel approach. Trie-Hash --> Trash
When extending the LC-trie
Paper with Stefan Nilsson/KTH
Exploits that key-length does not affect tree depth
We lengthen the so key it can be better compressed.
Implemented in Linux forwarding patch as a replacement to the route hash.
Trash data-structureCan do full key lookup. src/dst/sport/dport/proto/ifetc and later socket.
For even ip6 with little performance degradation
Could be a candidate for the grand unified lookup
Full flow lookup can understand connections.
Free flow logging etc
New garbage collection (GC) possible. Active GC stated
AGC in the paper. Listen to TCP SYN, FIN and RST Show to be performance winner.
Fully parallel routermulti-queue breakthrough
Load from one incoming 10g interface can be split among several CPU-cores
Using RSS (Receiver Scale Option). New NIC HW classifier
MSI-X interrupts affinity for RX, TX so a packet a skb is handled by one CPU core.
Breakthrough forwarding and for networking in general.
Fully parallel router conceptmulti-queue breakthrough
In experiment we used Intel 82598 adapters.Intel follows MS NDIS 6.0 for virtualization
SUN's 10g board has a more potent HW classifieraka TCAM.
Potent classifiers can yet another breakthrough for both functions and performance.
Control plane separation, (routing daemons) QoS, filters etc.
Fully parallel routermulti-queue breakthrough
Flow load. 31.000 fib_lookups/sec BGP table w. 271.064 routesDifferent 3 packet sizes 64 bytes 45% 576 bytes 25% 1500 bytes 30%
RSS and Multi-Queue (RX and TX) in useLinux 2.6.27-rc2 ixgbe-1.3.31.5 + patchesUsing 2/4 CPU cores from AMD Barcelona 2.3 GHz
Forwarding:: 6.2 Gbit/s (960 kpps)
10g boardsmulti-queue breakthrough
SUN's seems to use XFP's. Anyone using it....
Other boards with SFP/SFP+/XFP ??
118.1
expgw.data
ultrouter6
ultrouter7
ultGC-gw
Switch HVCknutpunkt
193.10.131.0/24
SLU2 SLU1
ultgw-2 ultgw-1
ultKC-gw
KC
GC
127.7HVC
127.2HVC
127.1DC
130.242.
127.54
127.53127.57127.58
127.62
127.61
127.69
127.70
127.45
127.46
127.86
UU
DC
DC
127.6
193.10.131
127.82 127.81
96.2 96.61
98.2 98.61
GigaSUNET
skara-gw
127.101
127.102
..233.33/24
34 Mb
88.34/30
88.33/30DCHVC
127.21
127.22
88.50/30
88.49/30
80.74/32 80.73/32
ultrouter8127.8HVC
127.17
127.18
DMZ UU/ITS
ultrouter9127.9HVC
127.13
127.14
127.85
23
3
1 1
3
3
332
1
1
1
1
.5 .4
/24
/24
e1 e1e5 e5e4 e4
e3
e3
e6 e6
e2e9e7
e8
e0
e0
e0e2
e3
e0
e9e10
e0
e2 e3
e1
e0 e1
e0
e2
e3e10
e1
e3
SLU's nät(inte hela)
118.1
expgw.data
ultrouter6
ultrouter7
ultGC-gw
Switch HVCknutpunkt
193.10.131.0/24
SLU2 SLU1
ultgw-2 ultgw-1
ultKC-gw
KC
GC
127.7HVC
127.2HVC
127.1DC
130.242.
127.54
127.53127.57127.58
127.62
127.61
127.69
127.70
127.45
127.46
127.86
UU
DC
DC
127.6
193.10.131
127.82 127.81
96.2 96.61
98.2 98.61
GigaSUNET
skara-gw
127.101
127.102
..233.33/24
34 Mb
88.34/30
88.33/30DCHVC
127.21
127.22
88.50/30
88.49/30
80.74/32 80.73/32
ultrouter8127.8HVC
127.17
127.18
DMZ UU/ITS
ultrouter9127.9HVC
127.13
127.14
127.85
23
3
1 1
3
3
332
1
1
1
1
.5 .4
/24
/24
e1 e1e5 e5e4 e4
e3
e3
e6 e6
e2e9e7
e8
e0
e0
e0e2
e3
e0
e9e10
e0
e2 e3
e1
e0 e1
e0
e2
e3e10
e1
e3
BGP policy routing
ISP:er (SUNET)och Knupunkt.
118.1
expgw.data
ultrouter6
ultrouter7
ultGC-gw
Switch HVCknutpunkt
193.10.131.0/24
SLU2 SLU1
ultgw-2 ultgw-1
ultKC-gw
KC
GC
127.7HVC
127.2HVC
127.1DC
130.242.
127.54
127.53127.57127.58
127.62
127.61
127.69
127.70
127.45
127.46
127.86
UU
DC
DC
127.6
193.10.131
127.82 127.81
96.2 96.61
98.2 98.61
GigaSUNET
skara-gw
127.101
127.102
..233.33/24
34 Mb
88.34/30
88.33/30DCHVC
127.21
127.22
88.50/30
88.49/30
80.74/32 80.73/32
ultrouter8127.8HVC
127.17
127.18
DMZ UU/ITS
ultrouter9127.9HVC
127.13
127.14
127.85
23
3
1 1
3
3
332
1
1
1
1
.5 .4
/24
/24
e1 e1e5 e5e4 e4
e3
e3
e6 e6
e2e9e7
e8
e0
e0
e0e2
e3
e0
e9e10
e0
e2 e3
e1
e0 e1
e0
e2
e3e10
e1
e3
Redundant inre kärna
118.1
expgw.data
ultrouter6
ultrouter7
ultGC-gw
Switch HVCknutpunkt
193.10.131.0/24
SLU2 SLU1
ultgw-2 ultgw-1
ultKC-gw
KC
GC
127.7HVC
127.2HVC
127.1DC
130.242.
127.54
127.53127.57127.58
127.62
127.61
127.69
127.70
127.45
127.46
127.86
UU
DC
DC
127.6
193.10.131
127.82 127.81
96.2 96.61
98.2 98.61
GigaSUNET
skara-gw
127.101
127.102
..233.33/24
34 Mb
88.34/30
88.33/30DCHVC
127.21
127.22
88.50/30
88.49/30
80.74/32 80.73/32
ultrouter8127.8HVC
127.17
127.18
DMZ UU/ITS
ultrouter9127.9HVC
127.13
127.14
127.85
23
3
1 1
3
3
332
1
1
1
1
.5 .4
/24
/24
e1 e1e5 e5e4 e4
e3
e3
e6 e6
e2e9e7
e8
e0
e0
e0e2
e3
e0
e9e10
e0
e2 e3
e1
e0 e1
e0
e2
e3e10
e1
e3
Redundant anslutingav tunga servernät via router discovery
118.1
expgw.data
ultrouter6
ultrouter7
ultGC-gw
Switch HVCknutpunkt
193.10.131.0/24
SLU2 SLU1
ultgw-2 ultgw-1
ultKC-gw
KC
GC
127.7HVC
127.2HVC
127.1DC
130.242.
127.54
127.53127.57127.58
127.62
127.61
127.69
127.70
127.45
127.46
127.86
UU
DC
DC
127.6
193.10.131
127.82 127.81
96.2 96.61
98.2 98.61
GigaSUNET
skara-gw
127.101
127.102
..233.33/24
34 Mb
88.34/30
88.33/30DCHVC
127.21
127.22
88.50/30
88.49/30
80.74/32 80.73/32
ultrouter8127.8HVC
127.17
127.18
DMZ UU/ITS
ultrouter9127.9HVC
127.13
127.14
127.85
23
3
1 1
3
3
332
1
1
1
1
.5 .4
/24
/24
e1 e1e5 e5e4 e4
e3
e3
e6 e6
e2e9e7
e8
e0
e0
e0e2
e3
e0
e9e10
e0
e2 e3
e1
e0 e1
e0
e2
e3e10
e1
e3