View
214
Download
0
Tags:
Embed Size (px)
Citation preview
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
1
High Throughput: Progress and Current Results
Lots of people helped:MB-NG team at UCLMB-NG team at ManchesterAndrew McNab
MB - NG
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
2
What’s Involved in Performance
End Hosts NIC: operation & PCI design – use of modern PCI commands
Chipset & PCIX
CPU, memory & memory-bus
OS kernel: drivers & TCP UDP IP stack
Disk sub-system: disks, controllers & interconnects
Routers Blades: operation
Switching / routing fabric: bus, crossbar, non-blocking operation
Policies
The Network (s) Framing
Bandwidth
Load & Congestion
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
3PC PC
MB – NG SuperJANET4 Development Network (22 Mar 02)
UCL
OSM-1OC48-POS-SS
MCC
OSM-1OC48-POS-SS
MAN
Gigabit Ethernet2.5 Gbit POS Access2.5 Gbit POS coreMPLS Admin. Domains
SJ4 Dev
SJ4 DevSJ4
Dev
SJ4 Dev
PC PCPC
3ware RAID0
PC
3ware RAID0
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
4
Initial Measurements
UDP throughput very variable
TCP throughput very variable Lon Man < Man Lon ?
Investigations: Packet loss several % Errors on interface
(ifconfig) Overruns
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
5
txqueuelen-vs-sendstalls
TCP throughput very variable Lon Man < Man Lon ? throughput very variable
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
6
Interrupt Coalescence Investigations TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 0 Rx-abs 128 820-980 Mbit/s +- 50 Mbit/s
Tx 64 Tx-abs 64 Rx 20 Rx-abs 128 937-940 Mbit/s +- 1.5 Mbit/s
Tx 64 Tx-abs 64 Rx 80 Rx-abs 128 937-939 Mbit/s +- 1 Mbit/s
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
7
24 Hours HighSpeed TCP mem-mem TCP mem-mem lon2-man1 Tx 64 Tx-abs 64 Rx 64 Rx-abs 128 941.5 Mbit/s +- 0.5 Mbit/s
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
8
Raid0 Performance (1)
Maxdor 3.5 Series DiamondMax PLus 9 120 Gb ATA/133
WriteSlight increase with number of disks
Read
3 Disks OK
Write 100 MBytes/s Read 130 MBytes/s
Read Throughput and no. Disks Stripe 64k raid0
0
20
40
60
80
100
120
140
160
0 200 400 600 800 1000 1200 1400 1600 1800 2000
File size MBytes
Mb
yte
s/s
2 disk
3 disk
4 disk
Write Throughput and no. Disks Stripe 64k raid0
0
20
40
60
80
100
120
140
160
180
0 200 400 600 800 1000 1200 1400 1600 1800 2000
File size MBytes
Mb
yte
s/s
2 disk
3 disk
4 disk
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
9
Raid0 Performance (2)
Maxdor 3.5 Series DiamondMax PLus 9 120 Gb ATA/133
No difference for Write
Larger Stripe lower the performance
Write 100 MBytes/s Read 120 MBytes/s
Disk-mem Write Throughput vs Stripe Size - 3 disk raid0
0
20
40
60
80
100
120
140
160
180
0 200 400 600 800 1000 1200 1400 1600 1800 2000
File size MBytes
MB
yte
s/s
64
128
256
512
1000
Disk-mem Read Throughput vs Stripe Size - 3 disk raid0
0
20
40
60
80
100
120
140
160
0 200 400 600 800 1000 1200 1400 1600 1800 2000
File size MBytes
Mb
yte
s/s
64
128
256
512
1000
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
10
Gridftp Throughput HighSpeedTCP Int Coal 64 128 Txqueuelen 2000 TCP buffer 1 M byte
(rtt*BW = 750kbytes)
Interface throughput
Acks received
Data moved 520 Mbit/s
Same for B2B tests
So its not that simple!
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
11
Gridftp Throughput + Web100
Throughput Mbit/s:
See alternate 600/800 Mbitand zero
Cwnd smooth No dup Ack / send stall /
timeouts
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
12
Gridftp Throughput + Web100
Throughput Mbit/s vsRecv Window Size
Zero throughputindependent of Recv Window Size
Bytes sent Bytes received
Waits 0.4s at start !
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
13
http data transfers HighSpeed TCP
Apachie web server out of the box!
prototype client - curl http library 1Mbyte TCP buffers 2Gbyte file Throughput 72 MBytes/s Cwnd - some variation No dup Ack / send stall /
timeouts
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
14
http data transfers (2)
Limited by: Sender
Receive window size
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
15
Int Coal 64 128 Txqueuelen 500
(no stall in rtt) TCP buffer 750k byte
(rtt*BW = 750k bytes)
1 stream every 60 s: man1 lon2 man2 lon2 man3 lon2
Sample ever 10ms Send rates:
940 Mbit/s 450 Mbit/s 300 Mbit/s
TCP sharing man1-lon2 mem-mem web100
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
16
Int Coal 64 128 Txqueuelen 500
(no stall in rtt) TCP buffer 750k byte
(rtt*BW = 750k bytes)
1 stream every 60 s: man1 lon2 man2 lon2 man3 lon2
Sample ever 10ms Time in send limit:
Sender Cwind Recv wind
TCP sharing man1-lon2 mem-mem web100
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
17
1Stream:
No Dup ACKs No SACKs No Sendstalls Why does Cwnd vary
TCP sharing man1-lon2 the WHY?
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
18
2Streams:
Many Dup ACKs Many SACKs Why does Cwnd
have large variations
2 TCP streams man1-lon2 - the WHY?
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
19
2Streams:
Dips in throughput due to Dup ACK
~4 losses /sec A bit regular ?
Cwnd decreases: 1 point 33% Ramp starts at 62% Slope 70Bytes/us
2 TCP streams man1-lon2 - the WHY? (2)
1 sec
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
20
3Streams:
Dips in throughput due to Dup ACK
3 TCP streams man1-lon2 - the WHY?
10 sec
DataTAG Meeting CERN 7-8 May 03R. Hughes-Jones Manchester
21
TCP sharing man1-lon2 - the WHY?
There is (a) correlation