44
ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes- Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”

ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester1

Protocols

Recent and Current Work.

Richard Hughes-Jones The University of Manchester

www.hep.man.ac.uk/~rich/ then “Talks”

Page 2: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester2

Outline

SC|05 TCP and UDP memory-2-memory & disk-2-disk flows 10 Gbit Ethernet

VLBI Jodrell Mark5 problem – see Matt’s Talk Data delay on a TCP link – How suitable is TCP?

4th Year MPhys Project Stephen Kershaw & James Keenan Throughput on the 630Mbit JB-JIVE UKLight Link 10 Gbit in FABRIC

ATLAS Network tests on Manchester T2 farm The Manc-Lanc UKLight Link ATLAS Remote Farms

RAID Tests HEP server 8 lane PCIe RAID card

Page 3: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester3

SCINet

Caltech Booth The BWC at the SLAC Booth

Collaboration at SC|05

ESLEA Boston Ltd. & Peta-CacheSun

Storcloud

Page 4: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester4

Bandwidth Challenge wins Hat Trick The maximum aggregate bandwidth was >151 Gbits/s

130 DVD movies in a minute serve 10,000 MPEG2 HDTV movies

in real-time 22 10Gigabit Ethernet waves

Caltech & SLAC/FERMI booths In 2 hours transferred 95.37 TByte 24 hours moved ~ 475 TBytes

Showed real-time particle event analysis

SLAC Fermi UK Booth: 1 10 Gbit Ethernet to UK NLR&UKLight:

transatlantic HEP disk to diskVLBI streaming

2 10 Gbit Links to SALC:rootd low-latency file access

application for clusters Fibre Channel StorCloud

4 10 Gbit links to FermiDcache data transfers

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

SLAC-ESnet

FermiLab-HOPI

SLAC-ESnet-USNFNAL-UltraLight

UKLight

SC2004 101 Gbit/s

In to booth

Out of booth

Page 5: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester5

ESLEA and UKLight

6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR

Disk-to-disk transfers with bbcp Seattle to UK Set TCP buffer and application

to give ~850Mbit/s One stream of data 840-620 Mbit/s

Stream UDP VLBI data UK to Seattle 620 Mbit/s

sc0502 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

M

bit/s

sc0503 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

M

bit/s

sc0504 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

M

bit/s

sc0501 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

time

Ra

te

M

bit/s

UKLight SC|05

0

500

1000

1500

2000

2500

3000

3500

4000

4500

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

Mb

it/s

Reverse TCP

Page 6: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester6

SLAC 10 Gigabit Ethernet 2 Lightpaths:

Routed over ESnet Layer 2 over Ultra Science Net

6 Sun V20Z systems per λ

dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s

Used Netweion NICs & Chelsio TOE Data also sent to StorCloud

using fibre channel links

Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk

Page 7: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester7

VLBI Work

TCP Delay and VLBI Transfers

Manchester 4th Year MPhys Project

by

Stephen Kershaw & James Keenan

Page 8: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester8

VLBI Network Topology

Page 9: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester9

VLBI Application Protocol

VLBI data is Constant Bit Rate

tcpdelay instrumented TCP program emulates sending CBR

Data. Records relative 1-way delay

Data1

●●●

Timestamp1

Time

TCP & Network Receiver

Timestamp2

Sender

Data2Timestamp4

Timestamp5

Data4

Timestamp3

Data3

Packet loss

RTT

Time

Sender Receiver

ACKSegment time on wire = bits in segment/BW

Remember Bandwidth*Delay Product BDP = RTT*BW

Page 10: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester10

Send time – 10000 packets

Check the Send Time

10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route:

Man-ukl-JIVE-prod-Man RTT ~26 ms

Slope 0.44 ms/message From TCP buffer size &

RTT Expect ~42 messages/RTT~0.6ms/message

Sen

d tim

e se

c

1 sec

Message number

Page 11: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester11

Send Time Detail

100 ms

Message 102Message 76

About 25 us One rtt

Sen

d tim

e se

c

26 messages

Message number

TCP Send Buffer limited After SlowStart Buffer full

packets sent out in burstseach RTT

Program blocked on sendto()

Page 12: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester12

1 way delay – 10000 packets

1-Way Delay

1 w

ay d

elay

100

ms

Message number

100 ms

10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route:

Man-ukl-JIVE-prod-Man RTT ~26 ms

Page 13: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester13

= 1.5 x RTT

= 1 x RTT 26 ms

Message number

≠ 0.5 x RTT

1 w

ay d

elay

10

ms

10 ms

Why not just 1 RTT? After SlowStart TCP Buffer Full Messages at front of TCP Send Buffer have to wait for next burst of ACKs – 1 RTT later Messages further back in the TCP Send Buffer wait for 2 RTT

1-Way Delay Detail

Page 14: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester14

5 ms

Message number

Route:LAN gig8-gig1

Ping 188 μs

10,000 Messages Message size: 1448 Bytes Wait times: 0 μs

Drop 1 in 1000

Manc-JIVE tests showtimes increasing with a “saw-tooth” around 10 s

1-Way Delay with packet drop

800 us

28 ms1

way

del

ay 1

0 m

s

Page 15: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester15

10 Gbit in FABRIC

Page 16: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester16

FABRIC 4Gbit Demo 4 Gbit Lightpath Between GÉANT PoPs Collaboration with Dante Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP tests

Page 17: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester17

10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to

2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes

66 MHz One 8000 byte packets

2.8us for CSRs 24.2 us data transfer

effective rate 2.6 Gbit/s

2000 byte packet, wait 0us ~200ms pauses

8000 byte packet, wait 0us ~15ms between data blocks

CSR Access 2.8us

Data Transfer

Page 18: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester18

ATLAS

Page 19: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester19

ESLEA: ATLAS on UKLight 1 Gbit Lightpath Lancaster-Manchester Disk 2 Disk Transfers Storage Element with SRM using distributed disk pools dCache & xrootd

Page 20: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester20

udpmon: Lanc-Manc Throughput

Lanc Manc Plateau ~640 Mbit/s wire rate No packet Loss

Manc Lanc ~800 Mbit/s but packet loss

Send times Pause 695 μs every 1.7ms So expect ~600 Mbit/s

Receive times (Manc end) No corresponding gaps

W11 pyg13-gig1_19Jun06

0

500

1000

1500

2000

2500

3000

3500

6200000 6210000 6220000 6230000 6240000 6250000Recv time 0.1us

1-w

ay d

ela

y u

s

W11 pyg13-gig1_19Jun06

0

500

1000

1500

2000

2500

3000

3500

6200000 6210000 6220000 6230000 6240000 6250000Send time 0.1us

1-w

ay

de

lay

us

pyg13-gig1_19Jun06

0100200300400500600700800900

1000

0 10 20 30 40Spacing between frames us

Recv W

ire r

ate

Mbit/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

Page 21: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester21

gig1-pyg13_20Jun06

0100200300400500600700800900

1000

0 10 20 30 40Spacing between frames us

Recv W

ire r

ate

Mbit/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

udpmon: Manc-Lanc Throughput

Manc Lanc Plateau ~890 Mbit/s wire rate

Packet Loss Large frames 10% when at line rate Small frames 60% when at line rate

1way delay

gig1-pyg13_20Jun06

0

20

40

60

80

100

0 10 20 30 40Spacing between frames us

% P

acket

loss

50 bytes

100 bytes 200 bytes

400 bytes 600 bytes

800 bytes 1000 bytes

1200 bytes 1400 bytes

1472 bytes

W11 gig1-pyg13_20Jun06

0

1000

2000

3000

4000

5000

6000

7000

0 1000 2000 3000 4000 5000Packet No.

1-w

ay d

ela

y u

s

Page 22: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester22

ATLAS Remote Computing: Application Protocol

Event Request EFD requests an event from SFI SFI replies with the event ~2Mbytes

Processing of event Return of computation

EF asks SFO for buffer space SFO sends OK EF transfers results of the computation

tcpmon - instrumented TCP request-response program emulates the Event Filter EFD to SFI communication.

Send OK

Send event data

Request event

●●●

Request Buffer

Send processed event

Process event

Time

Request-Response time (Histogram)

Event Filter Daemon EFD SFI and SFO

Page 23: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester23

tcpmon: TCP Activity Manc-CERN Req-Resp

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time

Data

Byte

s O

ut

0

50

100

150

200

250

300

350

400

Data

Byte

s I

n

DataBytesOut (Delta DataBytesIn (Delta Web100 hooks for TCP status

Round trip time 20 ms 64 byte Request green

1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

Data

Byte

s O

ut

0

50000

100000

150000

200000

250000

Cu

rCw

nd

DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value

TCP Congestion windowgets re-set on each Request

TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity

Even after 10s, each response takes 13 rtt or ~260 ms

020406080

100120140160180

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

TC

PA

ch

ive M

bit

/s

0

50000

100000

150000

200000

250000

Cw

nd

Transfer achievable throughput120 Mbit/s

Event rate very low Application not happy!

Page 24: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester24

tcpmon: TCP Activity Manc-cern Req-Respno cwnd reduction

Round trip time 20 ms 64 byte Request green

1 Mbyte Response blue TCP starts in slow start 1st event takes 19 rtt or ~ 380 ms

0

200000

400000

600000

800000

1000000

1200000

0 500 1000 1500 2000 2500 3000time

Da

ta B

yte

s O

ut

0

50

100

150

200

250

300

350

400

Data

Byte

s I

n

DataBytesOut (Delta DataBytesIn (Delta

0100200300400

500600700800900

0 1000 2000 3000 4000 5000 6000 7000 8000time ms

TC

PA

ch

ive M

bit

/s

0

200000

400000

600000

800000

1000000

1200000

Cw

nd

0

100

200

300

400

500

600

700

800

0 500 1000 1500 2000 2500 3000time ms

nu

m P

ackets

0

200000

400000

600000

800000

1000000

1200000

Cw

nd

PktsOut (Delta PktsIn (Delta CurCwnd (Value

TCP Congestion windowgrows nicely

Response takes 2 rtt after ~1.5s Rate ~10/s (with 50ms wait)

Transfer achievable throughputgrows to 800 Mbit/s

Data transferred WHEN theapplication requires the data

3 Round Trips 2 Round Trips

Page 25: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester25

Recent RAID Tests

Manchester HEP Server

Page 26: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester26

“Server Quality” Motherboards

Boston/Supermicro H8DCi Two Dual Core Opterons

1.8 GHz 550 MHz DDR Memory

HyperTransport

Chipset: nVidia nForce Pro 2200/2050

AMD 8132 PCI-X Bridge PCI

2 16 lane PCIe buses 1 4 lane PCIe 133 MHz PCI-X

2 Gigabit Ethernet SATA

Page 27: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester27

Disk_test: areca PCI-Express 8 port Maxtor 300 GB Sata disks RAID0 5 disks Read 2.5 Gbit/s Write 1.8 Gbit/s

RAID5 5 data disks

Read 1.7 Gbit/s Write 1.48 Gbit/s

afs6 R0 5disk areca 8PCIe 10 Jun06 Read 8k

0

1000

2000

3000

4000

5000

6000

7000

0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0

File size Mbytes

Th

ro

ug

hp

ut

Mb

it/s

Mbit/s 8k r

Mbit/s 8k w

afs6 R5 5disk areca 8PCIe 10 Jun06 Read 8k

0

1000

2000

3000

4000

5000

6000

7000

0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0File size Mbytes

Th

ro

ug

hp

ut

Mb

it/s

Mbit/s 8k r

Mbit/s 8k w

afs6 R6 7disk areca 8PCIe 10 Jun06 Read

0

1000

2000

3000

4000

5000

6000

7000

0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0File size Mbytes

Th

ro

ug

hp

ut

Mb

it/s

Mbit/s 8k r

Mbit/s 8k w

RAID6 5 data disks

Read 2.1 Gbit/s Write 1.0 Gbit/s

Page 28: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester28

Any Questions?

Page 29: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester29

More Information Some URLs 1 UKLight web site: http://www.uklight.ac.uk MB-NG project web site: http://www.mb-ng.net/ DataTAG project web site: http://www.datatag.org/ UDPmon / TCPmon kit + writeup:

http://www.hep.man.ac.uk/~rich/net Motherboard and NIC Tests:

http://www.hep.man.ac.uk/~rich/net/nic/GigEth_tests_Boston.ppt& http://datatag.web.cern.ch/datatag/pfldnet2003/ “Performance of 1 and 10 Gigabit Ethernet Cards with Server Quality Motherboards” FGCS Special issue 2004 http:// www.hep.man.ac.uk/~rich/

TCP tuning information may be found at:http://www.ncne.nlanr.net/documentation/faq/performance.html & http://www.psc.edu/networking/perf_tune.html

TCP stack comparisons:“Evaluation of Advanced TCP Stacks on Fast Long-Distance Production Networks” Journal of Grid Computing 2004

PFLDnet http://www.ens-lyon.fr/LIP/RESO/pfldnet2005/ Dante PERT http://www.geant2.net/server/show/nav.00d00h002

Page 30: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester30

Lectures, tutorials etc. on TCP/IP: www.nv.cc.va.us/home/joney/tcp_ip.htm www.cs.pdx.edu/~jrb/tcpip.lectures.html www.raleigh.ibm.com/cgi-bin/bookmgr/BOOKS/EZ306200/CCONTENTS www.cisco.com/univercd/cc/td/doc/product/iaabu/centri4/user/scf4ap1.htm www.cis.ohio-state.edu/htbin/rfc/rfc1180.html www.jbmelectronics.com/tcp.htm

Encylopaedia http://www.freesoft.org/CIE/index.htm

TCP/IP Resources www.private.org.il/tcpip_rl.html

Understanding IP addresses http://www.3com.com/solutions/en_US/ncs/501302.html

Configuring TCP (RFC 1122) ftp://nic.merit.edu/internet/documents/rfc/rfc1122.txt

Assigned protocols, ports etc (RFC 1010) http://www.es.net/pub/rfcs/rfc1010.txt & /etc/protocols

More Information Some URLs 2

Page 31: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester31

Backup Slides

Page 32: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester32

SuperComputing

Page 33: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester33

SC2004: Disk-Disk bbftp bbftp file transfer program uses TCP/IP UKLight: Path:- London-Chicago-London; PCs:- Supermicro +3Ware RAID0 MTU 1500 bytes; Socket size 22 Mbytes; rtt 177ms; SACK off Move a 2 Gbyte file Web100 plots:

Standard TCP Average 825 Mbit/s (bbcp: 670 Mbit/s)

Scalable TCP Average 875 Mbit/s (bbcp: 701 Mbit/s

~4.5s of overhead)

Disk-TCP-Disk at 1Gbit/sis here!

0

500

1000

1500

2000

2500

0 5000 10000 15000 20000

time msT

CP

Ach

ive M

bit

/s

050000001000000015000000200000002500000030000000350000004000000045000000

Cw

nd

InstaneousBW

AveBW

CurCwnd (Value)

0

500

1000

1500

2000

2500

0 5000 10000 15000 20000

time ms

TC

PA

ch

ive M

bit

/s

050000001000000015000000200000002500000030000000350000004000000045000000

Cw

nd

InstaneousBWAveBWCurCwnd (Value)

Page 34: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester34

SC|05 HEP: Moving data with bbcp What is the end-host doing with your network protocol? Look at the PCI-X 3Ware 9000 controller RAID0 1 Gbit Ethernet link 2.4 GHz dual Xeon ~660 Mbit/s

PCI-X bus with RAID Controller

PCI-X bus with Ethernet NIC

Read from diskfor 44 ms every 100ms

Write to Networkfor 72 ms

Power needed in the end hosts Careful Application design

Page 35: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester35

10 Gigabit Ethernet: UDP Throughput

1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire

rate

Mb

its/

s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

Page 36: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester36

10 Gigabit Ethernet: Tuning PCI-X

16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times from

the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

DataTAG Xeon 2.2 GHz

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

Page 37: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester37

10 Gigabit Ethernet: TCP Data transfer on PCI-X

Sun V20z 1.8GHz to2.6 GHz Dual Opterons

Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes

66 MHz

Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s

Burst of packets length646.8 us

Gap between bursts 343 us 2 Interrupts / burst

CSR Access

Data Transfer

Page 38: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester38

TCP on the 630 Mbit Link

Jodrell – UKLight – JIVE

Page 39: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester39

TCP Throughput on 630 Mbit UKLight Manchester gig7 – JBO mk5 606 4 Mbyte TCP buffer

test 0 Dup ACKs seen Other Reductions

test 1

test 2

0

200

400

600

800

1000

0 20 40 60 80 100 120

time s

TC

PA

chiv

e M

bit

/s

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

Cw

nd

InstaneousBWCurCwnd (Value)

0

200

400

600

800

1000

0 20 40 60 80 100 120

time s

TC

PA

chiv

e M

bit

/s

050000010000001500000200000025000003000000350000040000004500000

Cw

nd

InstaneousBW CurCwnd (Value

0

200

400

600

800

1000

0 20 40 60 80 100 120

time s

TC

PA

chiv

e M

bit

/s

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

Cw

nd

InstaneousBW CurCwnd (Value

Page 40: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester40

Message 102Message 76

100 ms

Sen

d tim

e se

c

26 messages

Comparison of Send Time & 1-way delay

Message number

Page 41: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester41

Route:Man-ukl-ams-prod-man

Rtt 27ms 10,000 Messages Message size: 1448 Bytes Wait times: 0 μs DBP = 3.4MByte TCP buffer 10MByte

1-Way Delay 1448 byte msgone-way

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

0 2000 4000 6000 8000 10000 12000Packet No.

1-w

ay d

elay

us

50 ms

Message number

0100

200300400

500600

700800

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

time ms

num

Pac

kets

0

500000

1000000

1500000

2000000

Cw

nd

P ktsOut (Delta)P ktsIn (Delta)CurCwnd (Value)

Web100 plot Starts after 5.6 Sec

due to Clock Sync. ~400 pkts/10ms Rate similar to iperf

Page 42: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester42

Related Work: RAID, ATLAS Grid RAID0 and RAID5 tests

4th Year MPhys project last semester Throughput and CPU load Different RAID parameters

Number of disksStripe sizeUser read / write size

Different file systemsExt2 ext3 XSF

Sequential File Write, Read Sequential File Write, Read with continuous background read or write

Status Need to check some results & document Independent RAID controller tests planned.

Page 43: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester43

Objective: demo 1 Gbit/s aggregate bandwidth between RAL and 4 Tier 2 sites RAL has SuperJANET4 and UKLight links: RAL Capped firewall traffic at 800 Mbit/s

SuperJANET Sites: Glasgow Manchester Oxford QMWL

UKLight Site: Lancaster

Many concurrent transfersfrom RAL to each of the Tier 2 sites

HEP: Service Challenge 4

~700 Mbit UKLight

Peak 680 Mbit SJ4

5510 +5530

5530

RouterA

UKLightRouter

3 x 5510+ 5530

5510-3stack

ADS Caches

CPUs +Disks

CPUs +Disks

CPUs +Disks

CPU +Disks

CPUs +Disks

10Gb/ s

4 x1Gb/ s

10Gb/ s

4 x 1Gb/ sto CERN

1Gb/ sto Lancaster

N x 1Gb/ s

N x 1Gb/ s

FW

1Gb/ s 1Gb/ s to SJ 4

RALSite

2 x 1Gb/ s

Tier 1

RALTier 2

10Gb/ s

CPU +Disks

5510-2stack

OracleRACs

5510 +5530

5530

RouterA

UKLightRouter

3 x 5510+ 5530

5510-3stack

ADS Caches

CPUs +Disks

CPUs +Disks

CPUs +Disks

CPU +Disks

CPUs +Disks

10Gb/ s

4 x1Gb/ s

10Gb/ s

4 x 1Gb/ sto CERN

1Gb/ sto Lancaster

N x 1Gb/ s

N x 1Gb/ s

FW

1Gb/ s 1Gb/ s to SJ 4

RALSite

2 x 1Gb/ s

Tier 1

RALTier 2

10Gb/ s

CPU +Disks

5510-2stack

OracleRACs

Applications able to sustain high rates.

SuperJANET5, UKLight &new access links very timely

Page 44: ESLEA Technical Collaboration Meeting, 20-21 Jun 2006, R. Hughes-Jones Manchester 1 Protocols Recent and Current Work. Richard Hughes-Jones The University

ESLEA Technical Collaboration Meeting , 20-21 Jun 2006, R. Hughes-Jones Manchester44

Network switch limits behaviour End2end UDP packets from udpmon

Only 700 Mbit/s throughput

Lots of packet loss

Packet loss distributionshows throughput limited

w05gva-gig6_29May04_UDP

0

100

200

300

400

500

600

700

800

900

1000

0 5 10 15 20 25 30 35 40Spacing between frames us

Recv W

ire r

ate

Mb

its/s

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

w05gva-gig6_29May04_UDP

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% P

acket

loss

50 bytes 100 bytes 200 bytes 400 bytes 600 bytes 800 bytes 1000 bytes 1200 bytes 1400 bytes 1472 bytes

w05gva-gig6_29May04_UDP wait 12us

0

2000

4000

6000

8000

10000

12000

14000

0 100 200 300 400 500 600Packet No.

1-w

ay d

ela

y u

s

0

2000

4000

6000

8000

10000

12000

14000

500 510 520 530 540 550Packet No.

1-w

ay d

ela

y u

s