33
Round the World Data Round the World Data Transfer Transfer End of LSR End of LSR (Land Speed Record) (Land Speed Record) in 10Gbps era in 10Gbps era Data Reservoir Project Data Reservoir Project Mary Mary Inaba Inaba University of Tokyo University of Tokyo

First of ALL Big appologize for Kei’s absence Hero of this year’s LSR achievement Takeshi in his experiment

Embed Size (px)

Citation preview

Round the World Data Round the World Data TransferTransferEnd of LSREnd of LSR   (Land Speed Record)(Land Speed Record)

in 10Gbps era in 10Gbps era

Data Reservoir ProjectData Reservoir ProjectMaryMary Inaba Inaba

University of TokyoUniversity of Tokyo

First of ALL

Big appologize for Kei’s absence

Hero of this year’s LSR achievementTakeshi in his experiment

What is Data Reservoir?• Share Scientific Data over long distance

– Physics, astronomy, earth science, biology

• High-speed data transfer on Long Fat pipe Network

• Easy to use– File system transparent

Data Reservoir   System

User Programs

Disk Server

IP Switch

File Server File Server

Disk Server

IP Switch IP Switch

Disk Server Disk ServerDisk Server Disk Server

iSCSI Bulk Transfer

Global Network

• Using iSCSI protocol • Without any modification on applicatoins

3rd Generation SC04 SC05Round the World 31,248km 1 to 1, memory to memory transferSingle Stream, Longest Path,   Standard MTU   TCP Throughput Award Fastest IPv6

Hisotry of Data Reservoir and SC BandWidth Challente

1st GenerationSC02 26 to 26 servers  1GbE interface RTT 200ms, 90 % usage of bottleneck OC-12

2nd Generation SC03  Aggregated 10Gbps 24,000Km 1 and a half round trip between U.S. Tokyo

32 to 32 Servers too many :-<

4th GenerationSC06A pair of machinesDisk to Disk transfer Single   7.2Gbps Dual 8.65 Gbps

Once upon a time,

There started an ambitious project

to construct an L2 network

between CERN and Tokyo

via Amsterdam, Canada, and U.S.

Fortunately ( ! ),

our team got a chance to try it  ♪

Network

Tokyo

CERNPittsburgh

Chicago

Amsterdam

GenevaSeattle

VancouverCalgary

Minneapolis

WIDEAPAN/JGN II

IEEAF/Tyco/WIDE

CANARIE

SURFnetAbilene

3rd Generation Data Reservoirstarted

Background• WAN PHY over the world• Programmable 10GbE NIC is available

Challenge How much bandwidth can we use by single stream?

Struggles while the 1st experiment

Almost no information– Ping + loopback is the only source– Different network, different timezone– TELEPHONE must be the most important equ

ipment.

Over 7Gbps between Tokyo and CERN

It is nice of this experiment to have a lot of new friend!

We really appreciate nice adivces.

Submission to

Internet2 Land Speed Record

Experiments while X’mas vacation,

the smallest traffic season!

Some Results

SC04 Band Width Challenge U.S. – Tokyo – U.S. – CERN 31,248km,    RTT 433ms, 7.57Gbps Xmas Experiment Season with smallest network traffic. Very Very strict dead-line for preparation

  Tokyo Chicago Amsterdam Siattle Tokyo 33,979km, RTT 498ms 7.21Gbps : Update LSR 8times.

Network

Tokyo

CERNPittsburgh

Chicago

Amsterdam

GenevaSeattle

VancouverCalgary

Minneapolis

WIDEAPAN/JGN II

IEEAF/Tyco/WIDE

CANARIE

SURFnetAbilene

Challenge in 2006To attain 90% of 10Gbps

The difficulty WAN PHY (MAX 9.6Gbps) ⇔   LAN PHY

Only 4% of 10Gbps, But, if RTT = 500, the difference is 25MBytes for Round Trip (TCP can control transmission rate with RTT grain)

Another difficulty PCI-X bottleneck →    Now, cleared

LSR in 2006 -- New players

• Circuit -- NetIron 40G NetIron RX-4 in Seattle

• GSO (Generic Segmentation Offload ) – Offloading CRC calculation

• Chelsio T310 -- PCI-X2.0 support IPG tuning is available

• Iperf modification with sendfile()

• Hardware Approach for 10Gbit Network TAPEE: Network Analyzer

2006 LSR Challenge, again on X’mas

• Around Dec/10: Seattle line test• Around Dec/20: Round-The-World up• Dec/31: Submission• Jan/8/2007: Round-The-World down

Host

• Xeon 5160 * 1– Woodcrest core– Dual core

• DDR400 2GB

• Chelsio T310-SR on PCI-Express x8– There is no longer bus speed bottleneck

• Linux 2.6.18

Circuit

• Round The World circuit– 522ms RTT– Trans Pacific & Trans Atlantic– WAN PHY & LAN PHY mixed

– Tokyo – [Los Angels] – Chicago – Amsterdam– Amsterdam – [Chicago] – Seattle – Tokyo

AmsterdamNetherLightAt SARA

SURFnetIEEAF CANARIE

L3 switch

Chicago StarLight

L2 switch

Atlantic

Ocean

Pacific

Ocean

WAN PHY

Force10

E1200

HDXc FoundryRX-4

SeattlePacific Northwest

Gigapop

SURFnet

SURFnet

SURFnet

WIDE JGN2

ONS15454

ONS15454

FoundryNI40G

GS4000

WAN PHY

WAN PHY

WAN PHY

HDXc

GS4000

Others

L1 switch

T-LEX

IEEAF

WAN PHY

LAN PHY

JGN2

LAN PHY

CANARIE CA* NET 4

WIDE

WAN PHY

LAN PHY

LSR 200612-2 Network Topology

FoundryRX-4

WAN PHY

Age-1Intel Xeon

Age-2Intel Xeon

FujitsuXG800

JGN2

Tokyo

Force10E300

JGN2

Los Angels

JGN2

WAN PHY

CISCO7609

HDXc

SURFnet

WAN PHY

NYC MANLAN

TransLightLAN PHY

TransLight

LSR distance

From To Distance

HND (35°33'08"N 139°46'47"E) ORD (41°58'43"N 87°54'17"W) 10147 km

ORD (41°58'43"N 87°54'17"W) AMS (52°18'31"N 04°45'50"E) 6630 km

AMS (52°18'31"N 04°45'50"E) SEA (47°26'56"N 122°18'34"W) 7864 km

SEA (47°26'56"N 122°18'34"W) HND (35°33'08"N 139°46'47"E) 7730 km

4 segment path: 32372 km

IPG Tuning

• Chelsio T310 has special function of setting IPG (Inter Packet Gap)– Enables to control the Ethernet NIC transmissi

on rate– Upto 2048 octet (IEEE standard IPG 12 octet)

• Fine Grain TuningFor Standard Frame control 50 ~  100 %,For 8000B Jumbo Frame 80 ~ 100%

Without pacing (IPG 136)600MB RWIN

Pacing (IPG 800)600MB RWIN

Pacing (IPG 700)600MB RWIN

Pacing (IPG 720)600MB RWIN

Iperf modification

• We have been used Iperf

• Iperf transmission flow– Allocate several kB buffer– Initialize buffer with random data– while() { write(sock, buffer) }

• This invokes copy between user and kernel space

Iperf modification (cont’d)

• An advice from Chelsio– “Use netperf’s sendfile mode to confirm receiver performance”

• Modification– Iperf-zerocopy transmission flow

• open(temporary file) file descriptor fd• buffer = mmap(fd)• initialize buffer with random data• while() { sendfile(sock, fd) }

– sendfile(2) sends data from kernel

• After some discussion, we concluded that using this version of Iperf meets LSR rule

GSO

GSO + zerocopy

New submission

• 7.67Gbps average– Standard-Iperf– Peak 8.10Gbps, 20 minutes, No packet loss

• 9.08Gbps average– Iperf-zerocopy

– Peak 9.11Gbps, 5 hours, No packet loss

History of single-stream IPv4 Land Speed Record

2000 2001 2003 2004 2005 2006 2007

Year

1

10

100

Distance bandwidth productPbit m / s

2004/11/9Data Reservoir project

WIDE project149 Pbit m / s

2002

1,000

2005/11/10240 Pbit m / s

10 Gbps * 30,000km

2006/2/20264 Pbit m / s

2004/12/24216 Pbit m / s

History of single-stream IPv6 Land Speed Record

2000 2001 2003 2004 2005 2006 2007

Year

1

10

100

Distance bandwidth productPbit m / s

2004/10/29Data Reservoir project

WIDE project167 Pbit m / s

2002

1,000

2005/11/13Data Reservoir project

WIDE project208 Pbit m / s

10 Gbps * 30,000km

2006/12/28Data Reservoir project

WIDE project272 Pbit m / s