24
Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester 1 Collaborations in Networking and Protocols HEP and Radio Astronomy Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”

Collaboration Meeting, 4 Jul 2006, R. Hughes-Jones Manchester 1 Collaborations in Networking and Protocols HEP and Radio Astronomy Richard Hughes-Jones

Embed Size (px)

Citation preview

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester1

Collaborations in Networking and Protocols

HEP and Radio Astronomy

Richard Hughes-Jones The University of Manchester

www.hep.man.ac.uk/~rich/ then “Talks”

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester2

VLBI Proof of Concept at iGrid2002European Topology: NRNs, Geant, Sites

SuperJANET4

iGrid 2002

ManchesterJodrell

SURFnet

JIVE

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester3

Normal Traffic

Normal Traffic +

Less Than Best Effort 2.0 Gbit/s

Normal Traffic +

Radio Astronomy Data 500 Mbit/s

Normal Traffic +

Radio Astronomy Data +

Less Than Best Effort 2.0 Gbit/s

Collaboration HEP, Radio Astronomy, Dante the NRNs, and Campus folks

Some results of the e-VLBI Proof of Concept

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester4

Jodrell BankUK

DwingelooDWDM link

MedicinaItaly Torun

Poland

e-VLBI at the GÉANT2 Launch Jun 2005

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester5

e-VLBI UDP Data Streams Collaboration HEP, Radio Astronomy, Dante the NRNs, and Campus folks Good opportunity to test UDP Throughput: 5 Hour run

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester6

ESLEA and UKLight Exploiting Switched Lightpaths for e-Science Applications EPSRC e-Science project £1.1M 11.5 FTE

Core Technologies:ProtocolsControl plane

HEP data transfers – ATLAS and D0 e-VLBI Medical Applications High Performance Computing

Involved with Protocols, HEP and e-VLBI Stephen Kershaw appointed as RA (joint with EXPReS)

Investigate how well the protocol implementations work UDP flows, TCP advanced stacks, DCCP (developed by UCL partners) Also examine how the Applications “use” the protocols Also the effect of the transport protocol on what the Application intended!

Develop real-time UDP transport for e-VLBI – vlbi_udp

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester7

ESLEA and UKLight

6 * 1 Gbit transatlantic Ethernet layer 2 paths UKLight + NLR

Disk-to-disk transfers with bbcp Seattle to UK Set TCP buffer and application

to give ~850Mbit/s One stream of data 840-620 Mbit/s

Stream UDP VLBI data UK to Seattle 620 Mbit/s

sc0502 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

M

bit/s

sc0503 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

M

bit/s

sc0504 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

M

bit/s

sc0501 SC|05

0

100

200

300

400

500

600

700

800

900

1000

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

time

Ra

te

M

bit/s

UKLight SC|05

0

500

1000

1500

2000

2500

3000

3500

4000

4500

16:00 17:00 18:00 19:00 20:00 21:00 22:00 23:00

date-time

Ra

te

Mb

it/s

Reverse TCP

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester8

tcpmon: TCP Activity for remote Farms:Manc-CERN Req-Resp

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time

Data

Byte

s O

ut

0

50

100

150

200

250

300

350

400

Data

Byte

s I

n

DataBytesOut (Delta DataBytesIn (Delta Web100 hooks for TCP status

Round trip time 20 ms 64 byte Request green

1 Mbyte Response blue TCP in slow start 1st event takes 19 rtt or ~ 380 ms

0

50000

100000

150000

200000

250000

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

Data

Byte

s O

ut

0

50000

100000

150000

200000

250000

Cu

rCw

nd

DataBytesOut (Delta DataBytesIn (Delta CurCwnd (Value

TCP Congestion windowgets re-set on each Request

TCP stack RFC 2581 & RFC 2861 reduction of Cwnd after inactivity

Even after 10s, each response takes 13 rtt or ~260 ms

020406080

100120140160180

0 200 400 600 800 1000 1200 1400 1600 1800 2000time ms

TC

PA

ch

ive M

bit

/s

0

50000

100000

150000

200000

250000

Cw

nd

Transfer achievable throughput120 Mbit/s

Event rate very low Application not happy!

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester9

ESLEA: ATLAS on UKLight 1 Gbit Lightpath Lancaster-Manchester Disk 2 Disk Transfers Storage Element with SRM using distributed disk pools dCache & xrootd

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester10

udpmon: Lanc-Manc Throughputnot quite what we expected !!

Lanc Manc Plateau ~640 Mbit/s wire rate No packet Loss

Manc Lanc ~800 Mbit/s but packet loss

Send times Pause 695 μs every 1.7ms So expect ~600 Mbit/s

Receive times (Manc end) No corresponding gaps

W11 pyg13-gig1_19Jun06

0

500

1000

1500

2000

2500

3000

3500

6200000 6210000 6220000 6230000 6240000 6250000Recv time 0.1us

1-w

ay d

ela

y u

s

W11 pyg13-gig1_19Jun06

0

500

1000

1500

2000

2500

3000

3500

6200000 6210000 6220000 6230000 6240000 6250000Send time 0.1us

1-w

ay

de

lay

us

pyg13-gig1_19Jun06

0100200300400500600700800900

1000

0 10 20 30 40Spacing between frames us

Recv W

ire r

ate

Mbit/s

50 bytes

100 bytes

200 bytes

400 bytes

600 bytes

800 bytes

1000 bytes

1200 bytes

1400 bytes

1472 bytes

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester11

EXPReS & FABRIC

EU Project to realise the current potential of eVLBI and investigate the Next Generation capabilities.

SSA Use of GRID Farms for distributed correlation. Linking Merlin telescopes to JIVE (present correlator)

4 * 1 Gigabit from Jodrell Links to 10 Service Challenge work.

Interface to eMERLIN – data at 30 Gbit/s

JRA - FABRIC Investigate use of different IP Protocols 10 Gigabit Onsala to Jodrell Links to 10 Gbit HEP work. Investigate 4 Gigabit over GEANT2 Switched Lightpaths

UDP and TCP Links to Remote Compute Farm HEP work. Develop 1 and 10 Gbit Ethernet end systems using FPGAs

Links to CALICE HEP work.

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester12

FABRIC 4 Gigabit Demo Will use a 4 Gbit Lightpath between two GÉANT PoPs Collaboration with Dante – Discussions in progress Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP tests

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester13

10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to

2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes

66 MHz One 8000 byte packets

2.8us for CSRs 24.2 us data transfer

effective rate 2.6 Gbit/s

2000 byte packet, wait 0us ~200ms pauses

8000 byte packet, wait 0us ~15ms between data blocks

CSR Access 2.8us

Data Transfer

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester14

Calice

Virtex 4 board from pld Applications PCI-express development card Using the FPGA to send and receive raw Ethernet frames at 1 Gigabit Package data from internal memory or external source into Ethernet Considering building a 10 Gigabit Ethernet add-on card

Take data in on the1Gig links, processing it, send results out on 10Gig link. Using 2 boards (2nd is a data generator) we could produce a small scale

Calice DAQ, take data in, buffer it to the DDR2 ram, and then read it out, Ethernet frame it and ship to PCs.

Ideas for an Ethernet packet monitor.

From Slides of

Marc Kelly

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester15

Backup Slides

Further network & end host investigations

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester16

VLBI Work

TCP Delay and VLBI Transfers

Manchester 4th Year MPhys Project

by

Stephen Kershaw & James Keenan

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester17

VLBI Network Topology

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester18

VLBI Application Protocol

VLBI data is Constant Bit Rate

tcpdelay instrumented TCP program emulates sending CBR

Data. Records relative 1-way delay

Data1

●●●

Timestamp1

Time

TCP & Network Receiver

Timestamp2

Sender

Data2Timestamp4

Timestamp5

Data4

Timestamp3

Data3

Packet loss

RTT

Time

Sender Receiver

ACKSegment time on wire = bits in segment/BW

Remember Bandwidth*Delay Product BDP = RTT*BW

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester19

1 way delay – 10000 packets

1-Way Delay

1 w

ay d

elay

100

ms

Message number

100 ms

10,000 Messages Message size: 1448 Bytes Wait time: 0 TCP buffer 64k Route:

Man-ukl-JIVE-prod-Man RTT ~26 ms

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester20

= 1.5 x RTT

= 1 x RTT 26 ms

Message number

≠ 0.5 x RTT

1 w

ay d

elay

10

ms

10 ms

Why not just 1 RTT? After SlowStart TCP Buffer Full Messages at front of TCP Send Buffer have to wait for next burst of ACKs – 1 RTT later Messages further back in the TCP Send Buffer wait for 2 RTT

1-Way Delay Detail

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester21

Recent RAID Tests

Manchester HEP Server

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester22

“Server Quality” Motherboards

Boston/Supermicro H8DCi Two Dual Core Opterons

1.8 GHz 550 MHz DDR Memory

HyperTransport

Chipset: nVidia nForce Pro 2200/2050

AMD 8132 PCI-X Bridge PCI

2 16 lane PCIe buses 1 4 lane PCIe 133 MHz PCI-X

2 Gigabit Ethernet SATA

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester23

Disk_test: areca PCI-Express 8 port Maxtor 300 GB Sata disks RAID0 5 disks Read 2.5 Gbit/s Write 1.8 Gbit/s

RAID5 5 data disks

Read 1.7 Gbit/s Write 1.48 Gbit/s

afs6 R0 5disk areca 8PCIe 10 Jun06 Read 8k

0

1000

2000

3000

4000

5000

6000

7000

0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0

File size Mbytes

Th

ro

ug

hp

ut

Mb

it/s

Mbit/s 8k r

Mbit/s 8k w

afs6 R5 5disk areca 8PCIe 10 Jun06 Read 8k

0

1000

2000

3000

4000

5000

6000

7000

0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0File size Mbytes

Th

ro

ug

hp

ut

Mb

it/s

Mbit/s 8k r

Mbit/s 8k w

afs6 R6 7disk areca 8PCIe 10 Jun06 Read

0

1000

2000

3000

4000

5000

6000

7000

0.0 500.0 1000.0 1500.0 2000.0 2500.0 3000.0 3500.0 4000.0File size Mbytes

Th

ro

ug

hp

ut

Mb

it/s

Mbit/s 8k r

Mbit/s 8k w

RAID6 5 data disks

Read 2.1 Gbit/s Write 1.0 Gbit/s

Collaboration Meeting , 4 Jul 2006, R. Hughes-Jones Manchester24

UDP Performance: 3 Flows on GÉANT

Throughput: 5 Hour run Jodrell: JIVE

2.0 GHz dual Xeon – 2.4 GHz dual Xeon670-840 Mbit/s

Medicina (Bologna): JIVE 800 MHz PIII – mark623 1.2 GHz PIII 330 Mbit/s limited by sending PC

Torun: JIVE 2.4 GHz dual Xeon – mark575 1.2 GHz PIII 245-325 Mbit/s limited by security policing (>400Mbit/s 20 Mbit/s) ?

Throughput: 50 min period Period is ~17 min

BW 14Jun05

0

200

400

600

800

1000

0 500 1000 1500 2000Time 10s steps

Rec

v w

ire r

ate

Mbi

t/s

JodrellMedicinaTorun

BW 14Jun05

0

200

400

600

800

1000

200 250 300 350 400 450 500Time 10s steps

Rec

v w

ire r

ate

Mbi

t/s

JodrellMedicinaTorun